Named entity recognition method combined with self-training model
Aiming to address the issue of insufficient samples for certain entity categories in the named entity recognition dataset,which hampered the model's ability to learn the category's features and resulted in lower overall performance,this study proposed a named entity recognition method that incorporated a self-training model.A teacher model was trained using the available named enti-ty recognition dataset.The improved text similarity function was used to search for unlabeled text that was most similar to the origi-nal dataset.The teacher model was utilized to generate pseudo-labels for the unlabeled text.These pseudo-labels were then combined with the labeled dataset to retrain a student model for the downstream named entity recognition task.The experimental results showed that,compared with the baseline model,the method achieved even better performance on the public datasets MSRA,CONLL03,and the legal entity recognition dataset.
named entity recognitionself-trainingtext similaritynatural language processingfew-shot