基于BERT的文本相似算法改进

Improvement of Text Similarity Algorithm Based on BERT

扫码查看

原文链接

维普
万方数据

中文摘要：随着深度神经网络的发展,越来越多的模型由注重自然语言的字面意思开始转向重视自然语言文本语义信息.模型为了获取文本的语义信息需要大量的本文进行训练.2018 年BERT的横空出世,提出了自然语言的预训练模型,基于BERT预训练模型的文本相似算法也受到了广泛的关注.由于BERT的attention机制并不注重文本的时序信息,而且在文本相似比较中,文本的时序信息是一个比较重要的特征,所以论文针对于BERT所缺少的这一部分时序信息对BERT模型进行了改进,最终在LCQMC数据集的验证集和测试集上分别得到了 2.07%和 0.87%的提升,分别达到了 89.39%和87.41%的准确率.

外文摘要：With the development of deep neural networks,more and more models have shifted from focusing on the literal meaning of natural language to focusing on the semantic information of natural language texts.In order to obtain the semantic infor-mation of the text,the model needs a large number of texts to be trained.In 2018,BERT is born,and a pre-training model for natu-ral language is proposed.The text similarity algorithm based on the BERT pre-training model has also received extensive attention.Since the attention mechanism of BERT does not pay attention to the timing information of the text,and the timing information of the text is an important feature in the text similarity comparison,this paper improves the BERT model for this part of the timing informa-tion thot BERT lacks.Finally,the validation and test sets of the LCQMC dataset are improved by 2.07%and 0.87%,respectively,reaching an accuracy of 89.39%and 87.41%,respectively.

外文关键词：

deep neural networkBERTsemantic informationtiming information

作者：

林志墅、匡立伟

展开 >

作者单位：

武汉邮电科学研究院武汉 430074

关键词：

深度神经网络 BERT 语义信息时序信息

出版年：

2024

DOI：

10.3969/j.issn.1672-9722.2024.07.012

计算机与数字工程

中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD

影响因子：0.355

ISSN：1672-9722

年,卷(期)：2024.52(7)