Text similarity calculation is a part of natural language processing and is used to calculate the similarity between two words,sentences,or texts in many application scenarios.Research on text similarity calculation plays an important role in the development of artificial intelligence.Text similarity calculation has conventionally been based on character string surfaces.With the introduction of word vectors,text similarity calculation can be modeled and calculated based on statistics and deep learning,in addition to combining it with pre-trained models.First,text similarity calculation methods can be divided into five categories:character string-based,word vector-based,pre-trained model-based,deep learning-based,and other methods.Each category is briefly introduced.Subsequently,according to the principles of the different text similarity calculation methods,common methods such as the edit distance,Hamming distance,bag of words model,Vector Space Model(VSM),Deep Structured Semantic Model(DSSM),and Simple Contrastive learning of Sentence Embedding(SimCSE)are discussed.Finally,commonly used data sets and evaluation criteria for text similarity calculation are sorted and analyzed,and the future development of text similarity calculation is prospected.
text similaritycharacter stringword vectorpre-trained modeldeep learning