首页|An unsupervised gene selection method based on multivariate normalized mutual information of genes
An unsupervised gene selection method based on multivariate normalized mutual information of genes
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
Elsevier
Gene expression data analysis has always been challenging due to complex and high-dimensional samples and genes. Generally, the number of samples is much smaller than the number of genes in microarray gene expression data. Handling this imbalance data as machine learning tasks have the risk of generating an over-fitted learning model, reducing predictability, and unreadability of genetic data. These problems can be significantly decreased by choosing the more informative genes. Unsupervised gene selection techniques can estimate the relation among genes well. Though using mutual information and symmetric uncertainty can estimate the genes' relevancy well, their bivariate measures ignore the possible dependencies among several genes. To address this issue, we propose an unsupervised gene selection scheme based on information theoretic measures. It uses a similarity-based algorithm for gene clustering and then introduces some virtual genes as representatives of gene clusters. These representative genes will have the most common information with the genes in clusters and the least similarity with the representatives of other clusters. The experimental results on benchmark microarray gene expression datasets demonstrate the effectiveness of our approach, as compared to some information theoretic schemes beside to prototype- and density-based clustering methods in both unsupervised and supervised scenarios.