首页|结合自训练模型的命名实体识别方法

结合自训练模型的命名实体识别方法

扫码查看
针对命名实体识别数据集中存在某些实体类别样本过少,使模型学习该类别特征能力较差,导致整体性能较低的问题,提出结合自训练模型的命名实体识别方法.利用已有的命名实体识别数据集训练一个教师模型,通过改进的文本相似度函数搜寻与原数据集最相似的无标签文本,利用教师模型对无标签文本生成伪标签,并将伪标签与有标签数据集混合重新训练一个学生模型用于下游的命名实体识别任务.试验结果表明,相较基线模型,该方法在公共数据集MSRA、CONLL03和法律实体识别数据集上取得更优的性能.
Named entity recognition method combined with self-training model
Aiming to address the issue of insufficient samples for certain entity categories in the named entity recognition dataset,which hampered the model's ability to learn the category's features and resulted in lower overall performance,this study proposed a named entity recognition method that incorporated a self-training model.A teacher model was trained using the available named enti-ty recognition dataset.The improved text similarity function was used to search for unlabeled text that was most similar to the origi-nal dataset.The teacher model was utilized to generate pseudo-labels for the unlabeled text.These pseudo-labels were then combined with the labeled dataset to retrain a student model for the downstream named entity recognition task.The experimental results showed that,compared with the baseline model,the method achieved even better performance on the public datasets MSRA,CONLL03,and the legal entity recognition dataset.

named entity recognitionself-trainingtext similaritynatural language processingfew-shot

肖伟、郑更生、陈钰佳

展开 >

武汉工程大学计算机科学与工程学院、人工智能学院,湖北武汉 430205

智能机器人湖北省重点实验室,湖北武汉 430205

命名实体识别 自训练 文本相似度 自然语言处理 少样本

国家自然科学基金青年基金

62106179

2024

山东大学学报(工学版)
山东大学

山东大学学报(工学版)

CSTPCD北大核心
影响因子:0.634
ISSN:1672-3961
年,卷(期):2024.54(2)
  • 24