首页|A cluster-based data splitting method for small sample and class imbalance problems in impact damage classification

A cluster-based data splitting method for small sample and class imbalance problems in impact damage classification

扫码查看
From collected experimental data, a rapid and precise classification model for impact damage modes (IMDs) can be developed using machine learning (ML) techniques to evaluate impact resistant capabilities of reinforced concrete (RC) building walls. However, experimental data is often small and imbalanced, resulting in significant degradation and instability in classification performance. In this study, an imbalanced 4-classes dataset consisted of 240 missile impact tests is employed, with the most minor class containing only 10 samples. The paper aims to develop an automated classification model for IDMs, using a clustering-based within-class stratified splitting technique, named WICS, combining with a well-known oversampling technique, namely SMOTE-NC, that considers not only the between class imbalance but also the within-class distribution to stabilize the classification performance. Four classifiers and five data splitting techniques are developed and implemented to address classification performance. We found that the support vector machine (SVM) classifier using WICS and SMOTE NC achieves the best micro F1 score (0.821), Cohen's kappa score (0.700), and AUC value (0.949) with highly stable performance. Friedman and Holm's post-hoc statistical tests also confirm the outperformance of WICS+SMOTE-NC over other techniques. (C) 2022 Elsevier B.V. All rights reserved.

Impact damageRC wallsImbalanced datasetSmall datasetImpact loadingCONCRETEPERFORATION

Quoc Hoan Doan、Mai, Sy-Hung、Quang Thang Do、Thai, Duc-Kien

展开 >

Sejong Univ

Natl Univ Civil Engn NUCE

Nha Trang Univ

2022

Applied Soft Computing

Applied Soft Computing

EISCI
ISSN:1568-4946
年,卷(期):2022.120
  • 7
  • 40