Textual Data Augmentation Blending TF-IDF and Pre-Trained Model
To improve the performance of textual data augmentation(TDA)in the field of natural language pro-cessing,a novel TDA algorithm is proposed by blending the TF-IDF algorithm and the BERT pre-trained language model.First,different from the traditional random selection strategy of the token selection method,the proposed method uses the TF-IDF algorithm to extract the most uninformative words into tokens and avoids rewriting tokens that play a key role in semantics.Then,since most existing data augmentation methods depend on input samples,lead-ing to the limited diversification of augmented samples,the pre-trained language model BERT is blended into the pro-posed method to predict the token and replace the tokens with the predicted results.Experimental results demonstrate that the proposed TDA algorithm efficiently improves the performance of the deep learning models by 5.8%,and the proposed method is superior to the existing TDA algorithms.
Natural language processingDeep learningTextual data augmentationPre-trained language model