一种基于数据增强的科技文献关键词提取模型

A Data Augmentation-Based Keywords Extraction Model for Scientific and Technical Literature

程芮 ¹张海军¹

扫码查看

作者信息

1. 新疆师范大学计算机科学技术学院乌鲁木齐 830054
折叠

摘要

[研究目的]科技文献关键词提取研究具有重要价值,目前研究中关键词提取方法存在较大误差且只能提取文本中的关键词,难以根据深层语义信息提炼出更符合文本核心主旨的词语.本研究针对关键词提取对上下文隐含语义挖掘不足导致的局限性和重点信息关注不足问题开展研究.[研究方法]提出一种基于数据增强的关键词提取模型(GPT-2 BiLSTM Mul-Attention,GPBA),通过语言模型进行数据增强,并结合BiLSTM+Mul-Attention提取模型进行多特征语义信息融合理解.[研究结论]实验结果表明,基于数据增强的关键词提取模型GPBA总体表现优于其他基线模型,并且能更精确地凝练和提取文本中的关键词.

Abstract

[Research purpose]The study of scientific and technical literature keywords extraction has significant value.Presently,exist-ing methods for keywords extraction have large errors and can only extract keywords from text,making it difficult to extract words that are more consistent with the core theme of the text based on deep semantic information.This paper focuses on the limitations of keywords ex-traction due to inadequate mining of implicit contextual semantics and insufficient attention to key information,and conducts research to address these issues.[Research method]It proposes a keywords extraction model(GPBA,GPT-2 BiLSTM Mul-Attention)based on data augmentation by language model,and combined with BiLSTM+Mul-Attention extraction model for multi-feature fusion to under-stand the semantic information.[Research conclusion]The experimental results demonstrate that GPBA,the data-enhanced keywords extraction model,outperforms other baseline models and accurately condenses keywords from text.

关键词

科技文献/关键词提取模型/数据增强/语义信息/评估指标

Key words

scientific and technical literature/keywords extraction model/data augmentation/semantic information/evaluation metrics

引用本文复制引用

基金项目

国家自然科学基金新疆联合基金重点项目(U1703261)

出版年

2024

情报杂志

陕西省科学技术信息研究所

情报杂志

CSTPCDCSSCICHSSCD北大核心

影响因子：1.502

ISSN：1002-1965

浏览量1

参考文献量34

段落导航