计算机科学2024,Vol.51Issue(z1) :627-633.DOI:10.11896/jsjkx.230500006

基于相似网络融合算法的癌症亚型预测

Cancer Subtype Prediction Based on Similar Network Fusion Algorithm

张晓茜 李东喜
计算机科学2024,Vol.51Issue(z1) :627-633.DOI:10.11896/jsjkx.230500006

基于相似网络融合算法的癌症亚型预测

Cancer Subtype Prediction Based on Similar Network Fusion Algorithm

张晓茜 1李东喜2
扫码查看

作者信息

  • 1. 太原理工大学数学学院 太原 030600
  • 2. 太原理工大学大数据学院 太原 030600
  • 折叠

摘要

从基因表达数据中挖掘基因之间的相互作用关系,构建基因调控网络,是生物信息学中重要的研究课题之一.但目前流行的神经网络在其架构中仅考虑基因之间的交互和关联,不考虑患者之间的交互和关联.为此,提出了一种基于加权基因相似网络和样本相似网络融合算法的癌症亚型预测模型,即 WGCSS(Weighted Genetic Correlation network and Sample Similarity network).该方法实现了特征空间和样本空间信息的融合,同时考虑了基因之间和样本之间的相互作用关系,并使用图卷积网络进行预测.在两个空间中聚合信息会导致严重的过度平滑问题,为此在该模型中引入残差层以缓解过度平滑问题.该方法通过聚合两个空间中的数据信息,可以使得癌症亚型预测的结果更加准确.为了验证方法的泛化性能,使用了乳腺浸润癌(BRCA)、多形性胶质母细胞瘤(GBM)和肺癌(LUNG)数据集进行分析,由此产生的高分类精度结果可以表明该方法的优越性.另外,还对3类数据集进行了生存分析,证明该方法在3个癌症数据集上癌症亚型的生存曲线存在显著差异.

Abstract

Mining the interaction relationship between genes from gene expression data and construct gene regulatory network is one of the important research topics in bioinformatics.However,the current popular neural network only considers the interaction and association between genes in its architecture,and does not consider the interaction and association between patients.There-fore,a cancer subtype prediction model based on the fusion algorithm of weighted gene similarity network and sample similarity network,namely WGCSS,is proposed in this paper.In this method,the fusion of feature space and sample space information is realized,and the interaction between genes and samples is considered,and the graph convolutional network is used for prediction.Aggregating information in two spaces will lead to a serious oversmoothing problem.Therefore,a residual layer is introduced in the model to alleviate the oversmoothing problem.This method can make the prediction of cancer subtypes more accurate by ag-gregating the data information in the two spaces.To verify the generalization performance of the method,datasets of invasive breast carcinoma(BRCA),glioblastoma multiforme(GBM),and LUNG(LUNG)are used for analysis,and the resulting high clas-sification accuracy demonstrates the superiority of the method.Survival analysis is also performed on three types of data sets,and it is proved that the method has significant differences in the survival curves of cancer subtypes in three cancer datasets.

关键词

加权基因相似网络/样本相似网络/残差图卷积网络/L1正则/癌症亚型预测

Key words

Weighted gene similarity network/Sample similarity network/Residual graph convolutional network/L1 regular/Cancer subtype prediction

引用本文复制引用

基金项目

国家自然科学基金(11571009)

山西省应用基础研究计划(201901D111086)

山西省重点研发计划(202102020101004)

山西省回国留学人员科研项目(2022-074)

出版年

2024
计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
参考文献量23
段落导航相关论文