Domain-adaptive Entity Resolution Algorithm Based on Semi-supervised Learning
Entity resolution is a fundamental task in many natural language processing tasks,which aims to find out whether two data entities refer to the same entity.Existing deep learning-based solutions for entity resolution typically require a large amount of annotated data,even when pre-trained language models are used for training.Obtaining such annotated data is challenging in real-world scenarios.To address this issue,a domain-adaptive entity resolution model based on semi-supervised learning is pro-posed.First,a classifier is trained on the source domain,and then domain adaptation is used to reduce the distributional difference between the source and target domains.Soft pseudo-labels from the augmented target domain are then added to the source domain for iterative training,enabling knowledge transfer from the source to the target domain.Comparison and ablation experiments are performed on 13 datasets from various domains.The results show that,compared to unsupervised baseline models,the proposed model achieves an average F1 score improvement of 2.84%,9.16%,and 7.1%across multiple datasets.Compared to supervised baseline models,it achieves comparable performance with only 20%to 40%of the labels required.Ablation experiments further demonstrate the effectiveness of the proposed model,and better entity resolution results can be obtained in general(The relevant code is available1)).
Entity resolutionDomain adaptationPseudo-labelsPre-trained language modelData augmentation