Sentiment classification of three-way borderline oversampling for imbalanced text
In practical applications, minority class samples often contain important information, while traditional machine learning methods usually have low classification accuracy and high misclassification cost for minority class samples. This paper proposes three-way sampling ( 3-way SMOTE, 3WOS ) and three-way borderline-SMOTE ( 3WOBS ) algorithms for the sentiment classification of unbalanced text data, based on three-way sampling ( 3WS ) and oversampling. Oversampling enables better identification of data on the borderline, and edge oversampling enhances the information contained in the borderline. First, the text data are built as a hypersphere and the support vectors of the hypersphere edges are obtained. Second, 3WOS oversamples the support vectors on the edges directly to generate synthetic new samples to update the sample set, while 3WOBS generates synthetic new samples and updates the sample set after determining whether to obtain the new samples based on the given conditions. Finally, the updated sample set is placed on different base classifiers for comparison experiments. Three imbalanced datasets are employed and different imbalance ratios are guaranteed. Moreover, granular computing is introduced during the training of the datasets to ensure the robustness of the model. Our experimental results show 3WOS-ITSC and 3WOBS-ITSC are more accurate and less costly than other models, providing a new way to address the imbalanced text classification.