针对恐怖袭击事件文本语料库匮乏的问题,文章制定了恐怖袭击事件的实体标注规范,通过对全球恐怖主义数据库(GTD)的数据进行实体标注,构建了恐怖袭击事件的实体语料库。同时,针对数据标注工作的高人力和高时间成本问题,由于百度通用信息抽取(Universal Information Extraction,UIE)模型在极小样本上具有较强的泛化能力,采用UIE模型进行辅助标注。实验结果证明了标注方案的有效性,并在一定程度上减少了标注时间。
Construction of Entity Corpus for Terrorist Attack Event
In views of the scarcity problem of terrorist attack event text corpus,a standard for entity annotation of terrorist attack event is established in this paper.Through entity annotation of data from the Global Terrorism Database(GTD),an entity corpus for terrorist attack events is constructed.At the same time,for the problems of high human power and high time cost of data annotation work,the Baidu Universal Information Extraction(UIE)model is used for the auxiliary annotation because of its strong generalization ability on small samples.The experimental results demonstrate that the annotation scheme is effective and it reduces the annotation time to some extent.
terrorist attack evententity corpusuniversal information extractionGlobal Terrorism DatabaseNamed Entity Recognition