融合义原相似度矩阵与字词向量双通道的短文本语义匹配策略

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：短文本语义匹配任务的目的是判断两个短文本句子的语义是否一致.然而,现有的许多方法往往存在短文本语义信息不足、无法有效识别同义词等问题.针对这些不足,提出一种融合义原相似度矩阵与字词向量双通道的短文本语义匹配策略.首先,利用预训练模型Bert对输入的句子对进行编码;然后,对于句子中词级别的语义信息,利用FastText模型训练并获取文本的词向量,并加入BiLSTM模型进一步提取上下文语义信息.为了有效利用义原信息,在上述的双通道中分别加入多头注意力和用于对分离向量进行交互计算的协同注意力,并在注意力中分别融入对应的义原相似度矩阵,最后综合上述两部分向量推断出语义的一致性.在金融领域数据集BQ和开放域数据集LCQMC上的实验证明了所提算法的有效性.

外文标题：Short Text Semantic Matching Strategy Fusing Sememe Similarity Matrix and Dual-channel of Char-Word Vectors

外文摘要：The purpose of the short text semantic matching task is to judge whether the semantics of two short text sentences are consistent.However,many existing methods often have shortcomings such as insufficient semantic information of short text and inability to effectively identify synonyms.In response to these shortcomings,this paper proposes a short text semantic matching strategy that fuses sememe similarity matrix and dual-channel of char-word vectors.Firstly,the pre-trained model Bert is used to encode the input sentence pairs;for the word-level semantic information in the sentence,the FastText model is used to train and obtain the word vector of the text,and the BiLSTM model is added to further extract the contextual semantic information.Se-condly,making effective use of the semantic information,multi-head attention and co-attention for interactive calculation of sepa-ration vectors are added to the above-mentioned dual-channel.And the semantic similarity matrix is integrated into the attentions respectively.Finally,infer the semantic consistency according to the above vectors.The effectiveness of the above algorithm is proved by experiments on the financial dataset BQ and the open domain dataset LCQMC.

外文关键词：

Natural language processingShort textSememeCo-attentionChar-Word vector

作者：

刘东旭、段利国、崔娟娟、常轩伟

展开 >

作者单位：

太原理工大学计算机科学与技术学院山西晋中 030600

山西电子科技学院山西临汾 041000

关键词：

自然语言处理短文本义原协同注意力字词向量

出版年：

2024

DOI：

10.11896/jsjkx.231100147

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(12)