一种面向催化材料领域的文献信息抽取方法

扫码查看

原文链接

万方数据
维普

中文摘要：为有效利用PDF文献中的非结构化文本数据,面向费托合成催化材料领域文献,设计了关键信息抽取流水线从PDF文献中抽取表格及其相应注释等关键信息.以微分二值化网络(differentiable binarization network,DBNet)为基准模型,通过引入自适应空间注意力(adaptive spatial attention,ASA)模块,提出了 DB-ASA文本检测模型,提高了检测精度.采用单视觉文本识别模型(scene text recognition with a single visual model,SVTR)进行文本识别,结合领域字典文件在自建数据集上对模型进行微调,文本识别准确率可达93.87％.

外文标题：A literature information extraction method for catalytic materials

外文摘要：In order to effectively utilize the unstructured text data in PDF literature in the Fischer-Tropsch synthesis of catalytic materials,a key information extraction pipeline was designed to extract key information such as tables and corresponding annotations from PDF documents.A DB-ASA text detection model was proposed by using the differentiable binarization network(DBNet)as a benchmark model and introducing an adaptive spatial attention(ASA)module,resulting in improved detection accuracy.Using scene text recognition with a single visual model(SVTR)for text recognition,the model was fine-tuned on a self-built dataset by combining domain dictionary files,achieving a text recognition accuracy of 93.87％.

外文关键词：

catalytic materialsFischer-Tropsch synthesisinformation extractiontext recognition

作者：

高强、张仰森、孙圆明、贾启龙

展开 >

作者单位：

北京信息科技大学智能信息处理实验室,北京 100192

关键词：

催化材料费托合成信息抽取文本识别

基金：

北京材料基因工程高精尖创新中心项目

项目编号：

出版年：

2024

DOI：

10.16508/j.cnki.11-5866/n.2024.02.008

北京信息科技大学学报(自然科学版)

北京信息科技大学

北京信息科技大学学报(自然科学版)

影响因子：0.363

ISSN：1674-6864

年,卷(期)：2024.39(2)

参考文献量19