基于深度主动学习的科技文献摘要结构功能识别研究

Identifying Structural Function of Scientific Literature Abstracts Based on Deep Active Learning

扫码查看

原文链接

维普
万方数据

中文摘要：[目的]探究不同深度主动学习方法对科技文献摘要的结构功能识别效果和标注成本.[方法]提出基于主动学习和序列标注的科技文献摘要结构功能识别方法,构建考虑句间上下文序列信息的SciBERT-BiLSTM-CRF模型(SBCA),然后分别提出基于摘要单句和摘要全文两个维度的基于不确定性的主动学习策略,并在PubMed 20K数据集上进行实验.[结果]SBCA模型具有最佳的识别效果,与不考虑序列信息仅使用SciBERT模型相比,F1值提升了11.93个百分点.使用基于整篇摘要的最小置信度策略达到SBCA模型的最优F1值仅需使用60％数据,使用基于单句的最小置信度策略达到SBCA模型的最优F1值仅需使用65％数据.[局限]本研究中仅构建了基于不确定性的主动学习查询策略,未考虑构建其他类别的查询策略.[结论]基于深度主动学习的方法有助于在更低注释成本的前提下进行摘要结构功能识别.

外文摘要：[Objective]This paper explores different DeepAL methods for identifying the structural function of scientific literature abstracts and their labeling costs.[Methods]Firstly,we constructed a SciBERT-BiLSTM-CRF model for the abstracts(SBCA),which utilized the contextual sequence information between sentences.Then,we developed an uncertainty active learning strategy for single sentences and full text of the abstracts.Finally,we conducted experiments on the PubMed 20K dataset.[Results]The SBCA model showed the best recognition performance and increased the F1 value by 11.93％,compared to the SciBERT model without sequence information.Using the Least Confidence strategy based on the abstracts,our SBCA model achieved its optimal F1 value with 60％of the experimental data.Using the Least Confidence strategy based on sentences,the SBCA model achieved optimal F1 value with 65％of the experimental data.[Limitations]In the future,we need to examine different active learning strategies in more fields or multi-language datasets.[Conclusions]The new model based on deep active learning could identify the structural function of scientific literature with a lower annotation cost.

外文关键词：

Deep LearningDocument Structural Function IdentificationMoveActive LearningKnowledge Organization

作者：

毛进、陈子洋

展开 >

作者单位：

武汉大学信息资源研究中心武汉 430072

武汉大学信息管理学院武汉 430072

关键词：

深度学习文献结构功能识别语步主动学习知识组织

基金：

国家自然科学基金项目高校人文社会科学重点研究基地重大项目

项目编号：

7217415422JJD870005

出版年：

2024

DOI：

10.11925/infotech.2096-3467.2023.0448

数据分析与知识发现

中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI

影响因子：1.452

ISSN：2096-3467

年,卷(期)：2024.8(6)