基于样本对纠正对比学习的问句相似性判别

Similarity measure for questions based on corrected-pairwise contrastive learning

王浩畅 ¹冯臻旸 ¹郑冠彧²

扫码查看

作者信息

1. 东北石油大学计算机与信息技术学院,黑龙江大庆 163318
2. 华南理工大学软件学院,广东广州 510006
折叠

摘要

为解决对比学习在难例问句的相似性判别上准确性不高的问题,提出一种基于样本对纠正对比学习的问句对相似性判别方法.利用预训练模型与双向LSTM结合进行捕获语义特征;采用自注意力机制关注关键信息并利用平均池化策略压缩特征构建句向量;引入样本对对比学习并结合标签误差设计带有惩罚项的对比损失函数,纠正难例相似度得分,增强语义空间的可分离性.实验结果表明,该方法与基线模型相比在问句相似性判别上获得了更好的F1值以及准确率.

Abstract

To solve the problem that the contrastive learning methods are not accurate in similarity measure of hard question sen-tences,a similarity measure method for questions based on corrected-pairwise contrastive learning was proposed.Semantic fea-tures were captured using pretrained model followed by Bi-LSTM module.Sentence embedding was composed using attention mechanism and average pooling strategy to focus on key information and semantic representation.A corrected-pairwise contras-tive loss function was designed to promote similarity scores and separability in the semantic space.Experimental results show that the proposed method achieves better F1 values and accuracy in similarity measure for questions compared to the baseline models.

关键词

相似性判别/对比学习/句向量/语义表征/预训练模型/自注意力机制/自然语言处理

Key words

similarity measure/contrastive learning/sentence embedding/semantic representation/pretrained model/self-atten-tion mechanism/natural language processing

引用本文复制引用

出版年

2024

计算机工程与设计

中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心

影响因子：0.617

ISSN：1000-7024

段落导航