Study Results from Donghua University Broaden Understanding of Machine Learning (Handling Missing Values and Imbalanced Classes In Machine Learning To Predict C onsumer Preference: Demonstrations and Comparisons To Prominent Methods)

扫码查看

Abstract

By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News-A new study on Machine Learning is now available. According to news reporting originating in Shanghai, People's Republic of China , by NewsRx journalists, research stated, "Consumer preference prediction aims t o predict consumers' future purchases based on their historical behavior-level d ata. Using machine learning algorithms, the prediction results provide evidence to conduct commercial activities and further improve consumer experiences." Funders for this research include National Natural Science Foundation of China ( NSFC), Fundamental Research Funds for the Central Universities and Graduate Stud ent Innovation Fund of Donghua University, Grants-in-Aid for Scientific Research (KAKENHI). The news reporters obtained a quote from the research from Donghua University, " However, missing values and imbalanced class problems of consumer behavioral dat a always make machine learning algorithms ineffective. While several methods hav e been proposed to address missing data or imbalanced class problems, few works have considered the relation-ships among missing mechanisms, imputation algorith ms, imbalanced class methods, and the effectiveness of classification algorithms that use impute data. In this study, we aim to propose an adaptive process for selecting the optimal combination of amputation, imputation, imbalance treatment , and classification based on classifi-cation performance. Our research extends the literature by showing significant interaction effects between 1) the amputat ion mechanism and imputation algorithms, 2) imputation and imbalance treatments, and 3) imbalance treatments and classification algorithms. Using three consumer behavioral datasets from the UCI Machine Learning Repository, we empirically sh ow that, among different classification methods, the overall performance of Rand om Forest is better than that of Logit, SVM, or Decision Tree. Moreover, Logit, as the most widely used classification method, suffers most from imbalance issue s in real-world datasets."

Key words

Shanghai/People's Republic of China/As ia/Algorithms/Cyborgs/Emerging Technologies/Machine Learning/Donghua Univer sity

引用本文复制引用

出版年

2024

Robotics & Machine Learning Daily News

ISSN：

段落导航