首页|东北亚舆情文本细粒度命名实体识别方法研究

东北亚舆情文本细粒度命名实体识别方法研究

扫码查看
东北亚地区的国际形势变化与中国的发展密切相关,面向该地区构建舆情信息知识图谱可以有效地监测舆情热点,这不仅能够引导社会舆论健康发展及协助政府决策,而且对防范政治营销、提升国家语言能力、构建和谐稳定国际关系具有重大价值。命名实体识别是构建知识图谱的关键技术和核心任务,受到研究者广泛的关注。以社交媒体、门户网站与东北亚地区相关的实时热点舆情文本作为数据源,充分考虑到东北亚地区的区域特点和地缘结构,建立包含10个大类、35个子类的细粒度命名实体识别数据集,并提出基于预训练语言模型RoBERTa和多层残差BiLSTM-CRF架构(RoBERTa-ResBiLSTM-CRF)的舆情实体识别模型,同时在模型完成标签预测后设计基于规则模板的后处理策略,以提高整体的实体识别性能。实验结果表明,所提出的舆情命名实体识别模型的性能优于主流的传统神经网络模型,验证了该方法的有效性。
Research on Fine-grained Named-Entity-Recognition Method for Public-Opinion Texts in Northeast Asia
The evolving international situation in Northeast Asia is associated closely with China's development.The construction of a sentiment information knowledge graph for this region enables the effective monitoring of public-opinion hotspots.This not only guides the healthy development of public opinion and assists government decision-making but also prevents political marketing,thus enhancing national language competence and promoting harmonious and stable international relations.Named Entity Recognition(NER)is a key technology and core task in constructing knowledge graphs and has garnered extensive attention from researchers.This study uses real-time hot-sentiment texts related to Northeast Asia from social media and portal websites as data sources.Considering the regional characteristics and geopolitical structure of Northeast Asia,a fine-grained NER dataset comprising 10 major categories and 35 subcategories is established.Furthermore,a sentiment entity-recognition model based on the pretrained language model RoBERTa and a multilayer residual BiLSTM-CRF architecture(RoBERTa-ResBiLSTM-CRF)is proposed.After the model completes label prediction,a post-processing strategy based on rule templates is designed to improve the overall entity-recognition performance.Experimental results demonstrate that the proposed sentiment NER model outperforms the mainstream neural-network models,thus validating the effectiveness of the approach.

fine-grainedNamed Entity Recognition(NER)public opinion textsdeep learningpre-trained language models

隗昊、刁宏悦、孔亮宸、邓耀臣

展开 >

大连外国语大学软件学院,辽宁大连 116044

大连外国语大学中国东北亚语言研究中心,辽宁大连 116044

大连外国语大学辽宁省新文科数字人文创新实验室,辽宁大连 116044

细粒度 命名实体识别 舆情文本 深度学习 预训练语言模型

辽宁省高等学校基本科研项目

LJKQZ20222451

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(5)
  • 22