Long Text Candidate Paragraph Extraction Based on Improved Transformer
In order to improve the performance of extracted machine reading comprehension,a long text candidate paragraph extraction model is built in the data preprocessing part to improve the quality of candidate answers.In the aspect of model word em-bedding,the N-tuple stroke information feature is added with position information to eliminate the ambiguity of cw2vec word vector model in learning Chinese stroke structure.In terms of model depth feature extraction,the self-Attention mechanism matrix in sparse Transformer is used to solve the problems of high computational complexity and long feature extraction time.Experiments on the DuReader data set show that the average accuracy and average ranking reciprocal index of the constructed paragraph extraction model reach 0.664 2 and 0.669 4 respectively.