基于弱语义样本的对比学习句嵌入方法

Weak Semantic Samples-based Contrastive Learning for Sentence Embeddings

徐斌斌 ¹严大川 ¹王建尚 ¹王小敏¹

扫码查看

作者信息

1. 兰州交通大学电子与信息工程学院,兰州 730070
折叠

摘要

为了有效消除句嵌入在语义特征空间的各向异性问题,提出一种基于弱语义样本的对比学习句嵌入方法,旨在生成有效句嵌入的同时,提升模型对文本语义相似性的识别效果.首先,采用标记重复算法构建相似样本并作为遮掩语言模型的输入,预测生成包含弱语义关系的样本;然后,将原始样本重复输入不同失活率的转换器,抽取不同的全局语义特征;最后,通过对比学习调整特征权重值,构建句嵌入.在公开数据集上进行系列对比实验,结果表明:基于弱语义样本的句嵌入表示方法性能优于其他方法,获得77.38％的相似性评估分数,为句嵌入生成和语义相似度识别任务提供了一种有效的解决方案.

Abstract

In order to effectively eliminate the anisotropy problem of sentence embedding in semantic feature space,a contrastive learning sentence embedding method based on weak semantic samples is proposed to generate effective sentence embedding while improving the model's recognition effect on textual semantic similarity.Firstly,similar samples are constructed through the token repetition algorithm as input to the masked language model(MLM)to predict and generate samples containing weak semantic relationships,and then the original samples are input into Transformers with different dropout rates to extract different global semantic features.Finally,the weight values of the features are adjusted through contrastive learning to obtain sentence embeddings.In a series of comparative ex-periments on public datasets,the results show that the sentence embedding representation method based on weak se-mantic samples outperforms other methods,achieving the highest similarity evaluation score of 77.38％.providing an effective solution for sentence embedding generation and semantic similarity recognition tasks.

关键词

句嵌入/对比学习/弱语义样本/文本相似性

Key words

sentence embeddings/contrastive learning/weak semantic samples/textual semantic similarity

引用本文复制引用

出版年

2024

兰州交通大学学报

兰州交通大学

兰州交通大学学报

影响因子：0.532

ISSN：1001-4373

参考文献量23

段落导航