非样本均衡细粒度金融要素抽取研究

Non-Sample Equilibrium Fine-grained Financial Element Extraction

徐土杰 ¹陈清财¹

扫码查看

作者信息

1. 哈尔滨工业大学(深圳)智能计算研究中心,广东深圳 518055
折叠

摘要

金融要素抽取旨在应用信息抽取技术,从合同、计划书中提取出能够反映金融文档关键性信息的一些实体、短语等,又称为金融要素,最终实现金融文档的自动化处理.相比现有抽取任务,金融要素抽取任务面临着样本长尾分布、细粒度以及长文本长要素等难点,现有抽取模型无法有效处理如此复杂的抽取问题,抽取效果不佳.对此,该文提出了将要素抽取任务转换为带类型的头尾指针预测任务的模型ENAPtBERT.一方面,ENAPtBERT头尾指针的设计缓解了不合法标签的影响,并能很好地结合不均衡损失函数以缓解不均衡问题.另一方面,ENAPtBERT利用引入的要素名称信息增强模型发现要素、分类要素的准确率.在金融要素抽取数据集上,ENAPtBERT的Micro-F1指标比现有抽取模型提升了 2.50％,Macro-F1指标至少提升了 2.66％,有效证明了ENAPtBERT处理复杂抽取问题的有效性.

Abstract

Financial element extraction attempts to utilize information extraction technology to extract particular enti-ties and phrases from contracts and plans that can reflect the main information of financial documents.This task is challenged by long tail distribution of samples,fine granularity,long components and long text,which are seldom encountered in other extraction work.The model ENAPtBERT is proposed in this research to convert the factor ex-traction job into a prediction task using typed head and tail pointers.The ENAPtBERT head and tail pointer's design reduces the impact of unlawful labels and may solve the imbalance issue by combining the imbalanced loss function.Meanwhile,the ENAPtBERT improves the accuracy of element finding and categorization by using the newly added element name information.Experiments indicate that the proposed method achieves 2.50％increase in Micro-F1 and 2.66％increase in Macro-F1 when compared to the existing methods.

关键词

金融要素抽取/不均衡/细粒度/要素名称信息

Key words

financial element extraction/imbalance/fine-grained/element name information

引用本文复制引用

基金项目

国家自然科学基金(61872113)

出版年

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCSCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

参考文献量3

段落导航