首页|An alert-situation text data augmentation method based on MLM

An alert-situation text data augmentation method based on MLM

扫码查看
An alert-situation text data augmentation method based on MLM
The performance of deep learning models is heavily reliant on the quality and quantity of train-ing data.Insufficient training data will lead to overfitting.However,in the task of alert-situation text classification,it is usually difficult to obtain a large amount of training data.This paper proposes a text data augmentation method based on masked language model(MLM),aiming to enhance the generalization capability of deep learning models by expanding the training data.The method em-ploys a Mask strategy to randomly conceal words in the text,effectively leveraging contextual infor-mation to predict and replace masked words based on MLM,thereby generating new training data.Three Mask strategies of character level,word level and N-gram are designed,and the performance of each Mask strategy under different Mask ratios is analyzed and studied.The experimental results show that the performance of the word-level Mask strategy is better than the traditional data augmen-tation method.

deep learningtext data augmentationmasked language model(MLM)alert-sit-uation text classification

丁伟杰、MAO Tingyun、CHEN Lili、ZHOU Mingwei、YUAN Ying、HU Wentao

展开 >

Key Laboratory of Public Security Information Application Based on Big-Data Architecture,Ministry of Public Security,Hangzhou 310053,P.R.China

Department of Computer and Information Security,Zhejiang Police College,Hangzhou 310053,P.R.China

Zhejiang Dahua Technology Co.,Ltd,Hangzhou 310053,P.R.China

deep learning text data augmentation masked language model(MLM) alert-sit-uation text classification

2024

高技术通讯(英文版)
中国科学技术信息研究所(ISTIC)

高技术通讯(英文版)

影响因子:0.058
ISSN:1006-6748
年,卷(期):2024.30(4)