Unsupervised sentence alignment is an important and challenging problem in the field of natural language processing.This task aims to find corresponding sentence correspondences in different languages and provide basic support for cross-language information retrieval,machine translation and other applications.This survey summarizes the current research status of unsuper-vised sentence alignment from three aspects:methods,challenges and applications.In terms of methods,unsupervised sentence alignment covers a variety of methods,including based on multi-language embedding,clustering and self-supervised or generative models.However,unsupervised sentence alignment faces challenges such as diversity,language differences,and domain adapta-tion.The ambiguity and diversity of languages complicates sentence alignment,especially in low-resource languages.Despite the challenges,unsupervised sentence alignment has important applications in fields such as cross-lingual information retrieval,ma-chine translation,and multilingual information aggregation.Through unsupervised sentence alignment,information in different languages can be integrated to improve the effect of information retrieval.At the same time,research in this field is also constan-tly promoting technological innovation and development,providing opportunities to achieve more accurate and robust unsuper-vised sentence alignment.
Unsupervised sentence alignmentNatural language processingMachine translationSelf-supervisedLow-resource