On distantly supervised named entity recognition(NER),there are many reinforcement learning based approaches,which exploit the powerful decision-making ability of reinforcement learning to detect noise from the automatically labeled data generated by distant supervision.However,the structures of the policy network models used are typically simple,which results in a weak ability to recognize noisy instances.Furthermore,correct instances are identified at sentence level,resulting in part of the useful information in the sentence being discarded.In this paper,we propose a new reinforcement learning based method for distantly supervised NER,named RLTL-DSNER,which can detect correct instances at token level from noisy data generated by distant supervision,proposing to reduce the negative impact of noisy instances on distantly supervised NER model.Specifically,we introduce a tag confidence function to identify correct instances accurately.In addition,we propose a novel pretraining strategy for the NER model.This strategy can provide accurate state representations and effective reward values for the initial training of the reinforcement learning model.The pre-training strategy can help guide it to update in the right direction.We conduct experiments on four datasets to verify the superiority of the RLTL-DSNER method,gaining 4.28%F1 improvement on NEWS dataset over state-of-the-art methods.
关键词
命名实体识别/远程监督/深度强化学习/噪声检测/预训练策略
Key words
named entity recognition/distant supervision/deep reinforcement learning/noise detection/pre-training strategy