计算机研究与发展2024,Vol.61Issue(4) :916-928.DOI:10.7544/issn1000-1239.202220999

面向远程监督命名实体识别的噪声检测

Noise Detection for Distant Supervised Named Entity Recognition

王嘉诚 王凯 王昊奋 杜渂 何之栋 阮彤 刘井平
计算机研究与发展2024,Vol.61Issue(4) :916-928.DOI:10.7544/issn1000-1239.202220999

面向远程监督命名实体识别的噪声检测

Noise Detection for Distant Supervised Named Entity Recognition

王嘉诚 1王凯 1王昊奋 2杜渂 3何之栋 3阮彤 1刘井平1
扫码查看

作者信息

  • 1. 华东理工大学信息科学与工程学院 上海 200237
  • 2. 同济大学设计与创意学院 上海 200092
  • 3. 迪爱斯信息技术股份有限公司 上海 200032
  • 折叠

摘要

针对远程监督命名实体识别(named entity recognition,NER)任务,目前有许多基于强化学习的方法,利用强化学习的强大决策能力,对远程监督生成的自动标注数据进行噪声过滤.然而,这些方法所使用的策略网络模型架构都较简单,识别噪声能力较弱,且都以完整的句子样本为单位进行识别,导致句子中的部分正确信息被丢弃.为解决上述问题,提出了一种新的基于强化学习的方法,称为RLTL-DSNER,该方法可以从远程监督生成的带噪数据中,以单词级别识别正确实例,减少噪声实例对远程监督NER的负面影响.具体来说,在策略网络模型中引入了标签置信函数来准确识别实例.此外,提出了一种新颖的NER模型预训练策略,使其能为强化学习的初始训练提供精准的状态表示和有效的奖励值,引导其向正确的方向更新.在 4个数据集上的实验结果验证了RLTL-DSNER方法的优越性,在NEWS数据集上,相较于现有最先进的方法,获得了4.28%的F1提升.

Abstract

On distantly supervised named entity recognition(NER),there are many reinforcement learning based approaches,which exploit the powerful decision-making ability of reinforcement learning to detect noise from the automatically labeled data generated by distant supervision.However,the structures of the policy network models used are typically simple,which results in a weak ability to recognize noisy instances.Furthermore,correct instances are identified at sentence level,resulting in part of the useful information in the sentence being discarded.In this paper,we propose a new reinforcement learning based method for distantly supervised NER,named RLTL-DSNER,which can detect correct instances at token level from noisy data generated by distant supervision,proposing to reduce the negative impact of noisy instances on distantly supervised NER model.Specifically,we introduce a tag confidence function to identify correct instances accurately.In addition,we propose a novel pretraining strategy for the NER model.This strategy can provide accurate state representations and effective reward values for the initial training of the reinforcement learning model.The pre-training strategy can help guide it to update in the right direction.We conduct experiments on four datasets to verify the superiority of the RLTL-DSNER method,gaining 4.28%F1 improvement on NEWS dataset over state-of-the-art methods.

关键词

命名实体识别/远程监督/深度强化学习/噪声检测/预训练策略

Key words

named entity recognition/distant supervision/deep reinforcement learning/noise detection/pre-training strategy

引用本文复制引用

基金项目

上海市促进产业高质量发展专项(2021-GZL-RGZN-01018)

国家重点研发计划(2021YFC2701800)

国家重点研发计划(2021YFC2701801)

之江实验室开放基金(2019ND0AB01)

上海市青年科技英才"扬帆计划"项目(23YF1409400)

出版年

2024
计算机研究与发展
中国科学院计算技术研究所 中国计算机学会

计算机研究与发展

CSTPCD北大核心
影响因子:2.649
ISSN:1000-1239
参考文献量38
段落导航相关论文