Sampling Algorithm for Reducing Class Overlap in Lumbar Disc Samples
The class overlap problem in medical data can severely affect the performance of intelligent disease diagnosis.To mitigate the negative impact of class overlap in lumbar disc samples on classifiers,this paper proposes a CO_HS algorithm,a hybrid sampling algorithm to reduce class overlap.This algorithm divides the training samples into core samples,boundary samples,and noise samples,sampling from the overlapping region to reduce the degree of class overlap in the dataset.New training samples generated by the CO_HS algorithm are used to train classification models such as Random Forest(RF),resulting in the establishment of six new classifiers for lumbar disc degeneration.Experimental results indicate that the newly established classifiers show significant improvement across multiple performance metrics.Specifically,the accuracy has increased by 7.8 percentage points to 12.7 percentage points,the kappa coefficient has increased by 11.6 percentage points to 20.2 percentage points,sensitivity has been improved by 7.9 percentage points to 16.8 percentage points,specificity has been elevated by 9.0 percentage points to 18.2 percentage points,and the F-measure has been boosted by 9.4 percentage points to 18.4 percentage points.Therefore,the CO_HS algorithm is proven to be an effective method for addressing the class overlap issue and improving classification performance.