现代信息科技2024,Vol.8Issue(19) :44-47,52.DOI:10.19850/j.cnki.2096-4706.2024.19.009

恐怖袭击事件实体语料库构建

Construction of Entity Corpus for Terrorist Attack Event

李林瑛 王孙和 曲云平
现代信息科技2024,Vol.8Issue(19) :44-47,52.DOI:10.19850/j.cnki.2096-4706.2024.19.009

恐怖袭击事件实体语料库构建

Construction of Entity Corpus for Terrorist Attack Event

李林瑛 1王孙和 1曲云平1
扫码查看

作者信息

  • 1. 大连外国语大学 软件学院,辽宁 大连 116044
  • 折叠

摘要

针对恐怖袭击事件文本语料库匮乏的问题,文章制定了恐怖袭击事件的实体标注规范,通过对全球恐怖主义数据库(GTD)的数据进行实体标注,构建了恐怖袭击事件的实体语料库.同时,针对数据标注工作的高人力和高时间成本问题,由于百度通用信息抽取(Universal Information Extraction,UIE)模型在极小样本上具有较强的泛化能力,采用UIE模型进行辅助标注.实验结果证明了标注方案的有效性,并在一定程度上减少了标注时间.

Abstract

In views of the scarcity problem of terrorist attack event text corpus,a standard for entity annotation of terrorist attack event is established in this paper.Through entity annotation of data from the Global Terrorism Database(GTD),an entity corpus for terrorist attack events is constructed.At the same time,for the problems of high human power and high time cost of data annotation work,the Baidu Universal Information Extraction(UIE)model is used for the auxiliary annotation because of its strong generalization ability on small samples.The experimental results demonstrate that the annotation scheme is effective and it reduces the annotation time to some extent.

关键词

恐怖袭击事件/实体语料库/通用信息抽取/全球恐怖主义数据库/命名实体识别

Key words

terrorist attack event/entity corpus/universal information extraction/Global Terrorism Database/Named Entity Recognition

引用本文复制引用

出版年

2024
现代信息科技
广东省电子学会

现代信息科技

ISSN:2096-4706
段落导航相关论文