首页|一种基于数据增强的科技文献关键词提取模型

一种基于数据增强的科技文献关键词提取模型

扫码查看
[研究目的]科技文献关键词提取研究具有重要价值,目前研究中关键词提取方法存在较大误差且只能提取文本中的关键词,难以根据深层语义信息提炼出更符合文本核心主旨的词语.本研究针对关键词提取对上下文隐含语义挖掘不足导致的局限性和重点信息关注不足问题开展研究.[研究方法]提出一种基于数据增强的关键词提取模型(GPT-2 BiLSTM Mul-Attention,GPBA),通过语言模型进行数据增强,并结合BiLSTM+Mul-Attention提取模型进行多特征语义信息融合理解.[研究结论]实验结果表明,基于数据增强的关键词提取模型GPBA总体表现优于其他基线模型,并且能更精确地凝练和提取文本中的关键词.
A Data Augmentation-Based Keywords Extraction Model for Scientific and Technical Literature
[Research purpose]The study of scientific and technical literature keywords extraction has significant value.Presently,exist-ing methods for keywords extraction have large errors and can only extract keywords from text,making it difficult to extract words that are more consistent with the core theme of the text based on deep semantic information.This paper focuses on the limitations of keywords ex-traction due to inadequate mining of implicit contextual semantics and insufficient attention to key information,and conducts research to address these issues.[Research method]It proposes a keywords extraction model(GPBA,GPT-2 BiLSTM Mul-Attention)based on data augmentation by language model,and combined with BiLSTM+Mul-Attention extraction model for multi-feature fusion to under-stand the semantic information.[Research conclusion]The experimental results demonstrate that GPBA,the data-enhanced keywords extraction model,outperforms other baseline models and accurately condenses keywords from text.

scientific and technical literaturekeywords extraction modeldata augmentationsemantic informationevaluation metrics

程芮、张海军

展开 >

新疆师范大学计算机科学技术学院 乌鲁木齐 830054

科技文献 关键词提取模型 数据增强 语义信息 评估指标

国家自然科学基金新疆联合基金重点项目

U1703261

2024

情报杂志
陕西省科学技术信息研究所

情报杂志

CSTPCDCSSCICHSSCD北大核心
影响因子:1.502
ISSN:1002-1965
年,卷(期):2024.43(1)
  • 1
  • 34