An unsupervised gene selection method based on multivariate normalized mutual information of genes

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Gene expression data analysis has always been challenging due to complex and high-dimensional samples and genes. Generally, the number of samples is much smaller than the number of genes in microarray gene expression data. Handling this imbalance data as machine learning tasks have the risk of generating an over-fitted learning model, reducing predictability, and unreadability of genetic data. These problems can be significantly decreased by choosing the more informative genes. Unsupervised gene selection techniques can estimate the relation among genes well. Though using mutual information and symmetric uncertainty can estimate the genes' relevancy well, their bivariate measures ignore the possible dependencies among several genes. To address this issue, we propose an unsupervised gene selection scheme based on information theoretic measures. It uses a similarity-based algorithm for gene clustering and then introduces some virtual genes as representatives of gene clusters. These representative genes will have the most common information with the genes in clusters and the least similarity with the representatives of other clusters. The experimental results on benchmark microarray gene expression datasets demonstrate the effectiveness of our approach, as compared to some information theoretic schemes beside to prototype- and density-based clustering methods in both unsupervised and supervised scenarios.

外文关键词：

Unsupervised gene selectionGene clusteringMicroarray gene expressionInformation theoryTotal correlationMultivariate normalized mutual informationCLASSIFICATIONFEATURES

作者：

Rahmanian, Mohsen、Mansoori, Eghbal G.

展开 >

作者单位：

Shiraz Univ

出版年：

2022

DOI：

10.1016/j.chemolab.2022.104512

Chemometrics and Intelligent Laboratory Systems

EISCI

ISSN：0169-7439

年,卷(期)：2022.222

被引量4
参考文献量44