首页|基于交互式特征与多尺度特征的文本相似度研究

基于交互式特征与多尺度特征的文本相似度研究

扫码查看
针对文本相似度分析过程中缺乏信息传递和忽略多元语义信息而导致相似度计算结果准确率低的问题,结合双向长短期记忆网络(BiLSTM),提出一种新颖的交互式特征与多尺度特征的文本相似度模型(IF-MSF)。首先,利用BiLSTM对句子进行编码提取全局特征矩阵,分别用软注意力机制和余弦相似度对特征矩阵进行交互,以相互传递两组特征矩阵内部的语义信息。其次,加权两组交互式特征以综合所有交互信息,并利用BiLSTM对加权交互式特征和初始编码特征再编码以捕获特征之间的差异信息。再次,使用多尺度卷积提取差异信息的多元语义特征并结合通道注意力机制增强重要特征信息。最后,融合两组增强特征判断文本对是否相似。实验选取2个数据集来验证该方法,该模型F1值分别取得最高值88。15%和85。03%,优于其他方法。
Research on Text Similarity Based on Interactive Features and Multi-scale Features
Aiming at the problem of low accuracy of similarity calculation results caused by lack of information transmission and neglecting multiple semantic information in the process of text similarity analysis,a novel text similarity model based on interactive features and multi-scale features was proposed by combining bidirectional long short-term memory(BiLSTM).Firstly,BiLSTM was used to encode the sentences and extract the global feature matrix,and the soft attention mechanism and cosine similarity were used to interact with the feature matrix respectively,so as to transfer the semantic information inside the two groups of feature matrices.Secondly,the two groups of interaction features were weighted to synthesize all interactive information,and BiLSTM was used to re-encode the weighted interactive features and the initial coding features to capture the difference information between the features.Thirdly,multiple semantic information of differential information were extracted by multi-scale convolution and channel attention was combined to enhance significant feature information.Finally,two sets of enhanced features were fused to judge whether the text pairs are similar.Two data sets were selected to verify the proposed method,and Fl values of the proposed model reached the highest values of 88.15%and 85.03%,which is better than that of other methods.

text similaritybidirectional long short-term memoryinteractive featuresmulti-scale featureschannel attention

尹春勇、沈子宁

展开 >

南京信息工程大学计算机学院、网络空间安全学院,江苏南京 210044

文本相似度 双向长短期记忆 交互式特征 多尺度特征 通道注意力

国家自然科学基金面上项目

6177282

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(8)