首页|科技文献的多层次结构功能识别

科技文献的多层次结构功能识别

扫码查看
实现科技文献结构功能的自动识别有助于提升细粒度信息检索、关键词抽取、引文分析等任务的效率.针对当前结构功能识别研究面临的文本内部依赖关系表达能力较弱、模型泛化迁移能力不足等问题,本研究利用图卷积神经网络捕捉单词节点间存在的固有依赖信息和拓扑结构,提升模型对科技文本建模表达能力,同时,还引入对抗学习思想,提升结构功能识别模型的泛化能力.选取ScienceDirect数据集,考察多种模型方法对章节标题、章节内容、章节段落三个不同层次的结构功能的识别效果,并在PubMed-20k的医学摘要结构功能数据集上进一步测试多种模型的跨领域迁移能力.研究结果表明,在章节标题层次,BERT+GCN的识别效果最佳,F1值达到了 88%,比基线模型提升3%;在章节内容层次,BERT+GAN的识别效果最佳,F1值达到了76%,比基线模型提升了 3%;在章节段落层次,F1值达到了 68%.BERT+GCN的跨领域迁移能力相比其他模型更优,在跨领域数据上取得了 90%的F1值.
Multi-level Functional Structure Recognition of Scientific Literature
The automatic recognition of structure function helps improve the efficiency of tasks such as fine-grained information retrieval,keyword extraction,and citation analysis.In response to the current chal-lenges faced by structure function recognition research,including weak expression of internal textual depend-encies and insufficient model generalization and transferability,this paper utilizes graph convolution neural networks to capture inherent dependency information and topological structures among word nodes,enhan-cing the modeling and representation capabilities of scientific publications.Additionally,adversarial learning is introduced to improve the generalization ability of the structure-function recognition model.The ScienceDi-rect dataset is selected to examine the recognition effectiveness of various model approaches for structure function at three different granularities:Header,Section,and Paragraph.Furthermore,we tested the trans-ferability of multiple models across domains on PubMED-20k,a medical abstract structure function recogni-tion dataset.Experimental results demonstrate that BERT+GCN get the best performance at the Header lev-el,with an F1 value of 88%,which is a 3%improvement over baseline models.At the Section level,the combination of BERT and GAN achieves the best performance,which is also a 3%improvement over base-line models.At the section paragraph level,the F1 score reaches 68%.BERT+GCN exhibits superior cross-domain transferability compared to other models,achieving an F1 score of 90%on cross-domain data.

Functional StructureGraph convolution networkGenerative adversarial networksScientif-ic literatureInformation recognition

刘昊坦、刘家伟、张帆、陆伟

展开 >

武汉大学信息管理学院,武汉,430072

武汉大学信息检索与知识挖掘研究所,武汉,430072

结构功能 图卷积神经网络 对抗生成网络 科技文献 信息识别

国家自然科学基金重点项目国家自然科学基金面上项目

7223400572174157

2024

信息资源管理学报
中国高校科技期刊研究会,武汉大学

信息资源管理学报

CSSCICHSSCD
影响因子:0.885
ISSN:2095-2171
年,卷(期):2024.14(3)