Aiming at the problem of unclear sentence vocabulary boundaries and neglected vocabulary and context relationship training,an ambiguous word segmentation information suppression algorithm based on multiple word segmentation situations was designed.The weights of different subwords of the utterance were calculated in the computation based on the pre-trained timing frequency table,the most likely subword cases were distinguished from other subword cases and merged into the utterance,and finally the information of subword weights was added in the independent variable mechanism to extract the contextual information of the utterance;the goal of adding the valid boundary information of the correct subword and the purpose of regulating the symmetric contextual relationship for ambiguous subword errorsr were achieved.A comparison between the MarkBert and W2NER algorithms was made and experiments on the public data sets such as Resume,MSRA,Weibo and OntoNotes showed that the algorithm improved the prediction accuracy and robustness when the sentence length increased,and increased the prediction accuracy when the data set increased.
named entity recognitionpre-trained modelself-attentionword boundary information