智能系统学报2024,Vol.19Issue(3) :719-727.DOI:10.11992/tis.202201020

利用BERT和覆盖率机制改进的HiNT文本检索模型

An improved HiNT text retrieval model using BERT and coverage mechanism

邸剑 刘骏华 曹锦纲
智能系统学报2024,Vol.19Issue(3) :719-727.DOI:10.11992/tis.202201020

利用BERT和覆盖率机制改进的HiNT文本检索模型

An improved HiNT text retrieval model using BERT and coverage mechanism

邸剑 1刘骏华 1曹锦纲1
扫码查看

作者信息

  • 1. 华北电力大学 控制与计算机工程学院,河北 保定 071003;复杂能源系统智能计算教育部工程研究中心,河北 保定 071003
  • 折叠

摘要

为有效提升文本语义检索的准确度,本文针对当前文本检索模型衡量查询和文档的相关性时不能很好地解决文本歧义和一词多义等问题,提出一种基于改进的分层神经匹配模型(hierarchical neural matching model,HiNT).该模型先对文档的各个段提取关键主题词,然后用基于变换器的双向编码器(bidirectional encoder rep-resentations from transformers,BERT)模型将其编码为多个稠密的语义向量,再利用引入覆盖率机制的局部匹配层进行处理,使模型可以根据文档的局部段级别粒度和全局文档级别粒度进行相关性计算,提高检索的准确率.本文提出的模型在MS MARCO和webtext2019zh数据集上与多个检索模型进行对比,取得了最优结果,验证了本文提出模型的有效性.

Abstract

To effectively improve the accuracy of text semantic retrieval,an improved hierarchical neural matching model is proposed,which can solve the problems of text ambiguity and polysemy when using text retrieval models to measure the relevance of queries and documents.The model first extracts key subject words from each segment of the document and then encodes them into multiple dense semantic vectors using the BERT model.Afterward,the local matching layer introduced with the coverage mechanism is used for processing so that the model can calculate the cor-relation according to the local segment-level granularity and the global document-level granularity of the document and improve the retrieval accuracy.The proposed model is compared with multiple retrieval models on the MS MARCO and webtext2019zh datasets,and the optimal results obtained verify the effectiveness of the proposed model.

关键词

基于变换器的双向编码器/分层神经匹配模型/覆盖率机制/文本检索/语义表示/特征提取/自然语言处理/相似度/多粒度

Key words

bidirectional encoder representations from transformers/hierarchical neural matching model/coverage mechanism/text retrieval/semantic representation/feature extraction/natural language processing/similarity/multi-granularity

引用本文复制引用

基金项目

中央高校基本科研业务费专项(2021MS085)

出版年

2024
智能系统学报
中国人工智能学会 哈尔滨工程大学

智能系统学报

CSTPCDCSCD北大核心
影响因子:0.672
ISSN:1673-4785
参考文献量4
段落导航相关论文