Construction of Entity Corpus for Terrorist Attack Event
李林瑛 1王孙和 1曲云平1
扫码查看
点击上方二维码区域,可以放大扫码查看
作者信息
1. 大连外国语大学 软件学院,辽宁 大连 116044
折叠
摘要
针对恐怖袭击事件文本语料库匮乏的问题,文章制定了恐怖袭击事件的实体标注规范,通过对全球恐怖主义数据库(GTD)的数据进行实体标注,构建了恐怖袭击事件的实体语料库.同时,针对数据标注工作的高人力和高时间成本问题,由于百度通用信息抽取(Universal Information Extraction,UIE)模型在极小样本上具有较强的泛化能力,采用UIE模型进行辅助标注.实验结果证明了标注方案的有效性,并在一定程度上减少了标注时间.
Abstract
In views of the scarcity problem of terrorist attack event text corpus,a standard for entity annotation of terrorist attack event is established in this paper.Through entity annotation of data from the Global Terrorism Database(GTD),an entity corpus for terrorist attack events is constructed.At the same time,for the problems of high human power and high time cost of data annotation work,the Baidu Universal Information Extraction(UIE)model is used for the auxiliary annotation because of its strong generalization ability on small samples.The experimental results demonstrate that the annotation scheme is effective and it reduces the annotation time to some extent.
关键词
恐怖袭击事件/实体语料库/通用信息抽取/全球恐怖主义数据库/命名实体识别
Key words
terrorist attack event/entity corpus/universal information extraction/Global Terrorism Database/Named Entity Recognition