Text Similarity Measurement Method of Scientific Research Projects Based on Hierarchical Depth Semantics
The article check of research projects is a very important issue in the academic field,and text similarity measure-ment is a key step in the article check.The current text similarity measurement methods of research projects are mainly based on string comparison or the TF-IDF method,which do not take into account the semantic features of the text.This manuscript proposes a hierarchical semantic similarity measurement method for the article of electric power technology projects.This method uses the pre-model RoBERTa-WWM and Whitening to extract the semantic features of sentences,and establishes the hierarchical deep se-mantic similarity of the item texts through cosine similarity.Three levels of hierarchical semantic similarity include similarity be-tween sentences,similarity between chapters,and similarity between articles.This paper shows the effectiveness of the Whitening method on the AFQMC data set,and verifies that our method is superior to the similarity based on string distance and TF-IDF on 50 power technology project articles and corresponding translated articles.
text similaritynatural language processingscientific research projects