Research on Data Augmentation for Extractive Reading Comprehension
In extractive reading comprehension,the performance of language model is poor in the case of less training data.Although EDA method can effectively increase the amount of data,it will cause the loss of semantic information in the data,resulting in poor training effect of the model.In response to the above problems,combined with EDA,a data augmentation method for extracting reading comprehension in the case of few samples is proposed.The data is enhanced at the word level and sentence level on the premise of retaining the correct answers to the questions in the data.At the same time,the data is enhanced for the single word with the least impact on sentence semantics,The data aug-mentation method based on semantic similarity(DASS)is used to calculate the semantic similarity of a word in a sentence before and after de-letion to determine the impact of the word on sentence semantics.The word with the least impact on semantics is selected for data enhancement to improve the quality of training data,so as to improve the robustness of the language model.The experimental results on HotpotQA show that DASS can solve the problem of insufficient semantic information when the number of samples is small.When the number of samples is 500,the F1 value predicted by the model increases by 23.94%.When this method is used for the whole dataset,the F1 value predicted by the mod-el increases by 2.54%.