RESEARCH ON CALCULATION OF SEMANTIC SIMILARITY OF CHINESE SHORT TEXT BASED ON ROBERTA
Aimed at the problem of insufficient feature extraction ability in the traditional text semantic similarity calculation model based on the Siamese network,a fusion of Siamese networks and Roberta pre-training model SRoberta-SelfAtt is proposed.On the Siamese network architecture,the Roberta(a robustly optimized bert pretraining approach)pre-training model was used to encode the original text pairs into character-level vectors,and the self-attention mechanism was used to capture the associations between different words in the text.The sentence vector of the text pair was obtained through the pooling strategy,and the expression results were interacted and merged.The loss value was calculated in the fully connected layer to evaluate the semantic similarity of the text pair.This model was tested on three data sets under two types of tasks.The results show that the proposed model is improved compared with other models,and provides an effective basis for further research on optimizing the accuracy of text semantic similarity calculation.
Siamese networkRobertaSelf-attentionChinese short textSemantic similarity calculation