In the face of the"curse of dimensionality"and"overfitting"problems brought by high-dimensional data in the fields of text classification and computer vision,this study proposes an unsupervised feature extraction method that combines feature self rep-resentation and greedy algorithms.This method linearly represents each feature with other features to form a feature self representation model,and optimizes it using greedy algorithms.The experimental data shows that the proposed method only takes 0.13 seconds in runtime;In terms of accuracy,the average scores of maximum variance method,principal component analysis method,regularization representation,and unsupervised feature extraction were 67.24%,80.16%,83.48%,and 83.58%,respectively.Obviously,ex-cept for the maximum variance method,this unsupervised feature extraction method performs the best.The experimental results dem-onstrate the effectiveness of the feature extraction method combined with greedy algorithms in reducing time complexity and improving accuracy,providing a new perspective for future unsupervised feature extraction.
关键词
贪婪算法/特征提取/无监督/高维大数据
Key words
greedy algorithm/feature extraction/no supervision/high dimensional big data