首页|基于跨模态近邻流形散布的基因特征提取方法

基于跨模态近邻流形散布的基因特征提取方法

扫码查看
为解决因基因表达数据维度高、样本少、噪声高等特点导致在基因分类研究中难以提取有效特征的问题,提出了跨模态近邻流形散布(cross-modal nearest neighbor manifold scatter,CNNMS)方法,在核方法基础上采用近邻数据的方式,从而进一步降低了类别不平衡对分类精度的影响.此外,基于近邻均值受异常点影响较小的特点,CNNMS方法把高维基因特征映射到核空间,将所有样本与其近邻样本之间距离均值定义为样本的近邻均值,使跨模态近邻流形散布子空间在最大程度上保持同类特征内部的聚集性.实验结果表明,CNNMS方法在肺癌基因表达数据集上的分类识别率超过 98%,在胃癌基因表达数据集上也获得了良好的分类识别率,相较于其他方法具有更好的分类能力.CNNMS方法在基因分类研究中表现出较高的识别率,对基因特征提取研究具有深远意义.
Gene Feature Extraction Method Based on Cross-modal Nearest Neighbor Manifold Scatter
In order to address the challenges posed by high-dimensional,small-sample,and noisy gene expression data in gene classification,this paper proposed the cross-modal nearest neighbor manifold scatter(CNNMS)method.The method utilized the nearest neighbor data based on the kernel method to further diminish the impact of class imbalance on classification accuracy.Additionally,leveraging the fact that the nearest neighbor mean is less influenced by outliers,CNNMS method mapped high-dimensional gene features to the kernel space.It defined the mean of the distance between all samples and their nearest neighbor samples as the nearest neighbor mean of the sample.This approach aimed to maintain clustering in the same feature class to the greatest extent in the multimodal nearest neighbor manifold dispersion subspace.The experimental results demonstrated that CNNMS method achieved a classification recognition rate of over 98%in lung cancer gene expression datasets and showed good classification recognition rate in gastric cancer gene expression datasets,and the method exhibited superior classification ability compared to other methods.CNNMS method proposed in the paper demonstrated a high recognition rate in gene classification research,bringing significant advancements to gene feature extraction.

gene feature extractioncanonical correlation analysisdata dimensionality reductiongene classificationnearest neighbor scatterdiscrimination sensitivitycancer diagnosis

王孟明、张志鹏、侯雅魁

展开 >

安徽理工大学 计算机科学与工程学院,安徽 淮南 232001

基因特征提取 典型相关分析 数据降维 基因分类 近邻散布 鉴别敏感 癌症诊断

国家自然科学基金项目安徽省高等学校自然科学研究项目安徽省高等学校自然科学研究项目

618060062022AH040113KJ2018A0083

2024

湖北民族大学学报(自然科学版)
湖北民族学院

湖北民族大学学报(自然科学版)

影响因子:0.458
ISSN:2096-7594
年,卷(期):2024.42(1)
  • 15