赣南医学院学报2024,Vol.44Issue(5) :441-449.DOI:10.3969/j.issn.1001-5779.2024.05.001

基于配对样本芯片数据识别乳腺癌关键基因及验证

Identification and verification of key genes in breast cancer based on paired sample microarray data

刘梦莹 蔡浩 何明 罗芸 郭有
赣南医学院学报2024,Vol.44Issue(5) :441-449.DOI:10.3969/j.issn.1001-5779.2024.05.001

基于配对样本芯片数据识别乳腺癌关键基因及验证

Identification and verification of key genes in breast cancer based on paired sample microarray data

刘梦莹 1蔡浩 1何明 1罗芸 1郭有1
扫码查看

作者信息

  • 1. 赣南医科大学第一附属医院医药大数据与生物信息研究中心,江西 赣州 341000
  • 折叠

摘要

目的:高通量筛选结合生物信息学分析方法寻找乳腺癌关键基因,为探讨乳腺癌潜在分子标志物、提高对乳腺癌发生发展机制的认识提供指导.方法:⑴从GEO数据库筛选所需基因芯片数据集,利用GEO2R在线分析各数据集的差异表达基因,绘制韦恩图得到共同显著差异表达基因(Commonly differentially expressed genes,CDEGs).⑵利用DAVID数据库,对CDEGs进行GO与KEGG的富集分析.⑶通过STRING数据库和Cytoscape软件构建CDEGs可视化蛋白互作网络(Protein-protein interaction,PPI),并利用CytoHubba插件计算得到乳腺癌关键基因.⑷基于筛选的基因芯片数据集和GEPIA数据库验证这些关键基因在乳腺癌组织和正常组织中的表达.⑸通过Kaplan-Meier plotter数据库对乳腺癌关键基因进行预后分析,绘制生存曲线.结果:⑴筛选得到GSE15852、GSE109169和GSE33447 3个数据集,共鉴定出355个CDEGs.⑵CDEGs参与多种生物过程和信号通路,其中胞外外泌体、蛋白质性细胞外基质、核染色质、质膜、细胞外间隙、PPAR信号通路、酪氨酸代谢、视黄醛代谢与乳腺癌密切相关.⑶基于CDEGs构建的PPI网络由298个点和1 327条边组成,并确定AURKA、CDK1、MCM4、TOP2A、RRM2、PRC1、HMMR、MELK、GINS2和UHRF1为乳腺癌关键基因.⑷GSE15852、GSE109169和GSE33447 3个数据集和GEPIA数据库验证相比于正常组织,关键基因在乳腺癌组织中的表达增加.⑸生存分析表明乳腺癌关键基因高表达患者的OS均显著低于低表达患者(P<0.001).结论:乳腺癌关键基因与其发生及预后密切相关,为乳腺癌的临床诊断、改善患者预后提供了新思路.

Abstract

Objective:Through high-throughput screening and bioinformatics analysis,the key genes of breast cancer was found,to provide guidance for exploring the potential molecular markers of breast cancer and improving the understanding of the pathogenesis in breast cancer.Methods:⑴ The gene chip datasets were screened through the GEO database.The differential expression genes in each datasets were obtained through GEO2R online analysis.Commonly significantly differentially expressed genes(CDEGs)were obtained through Venn diagram.⑵ GO and KEGG enrichment analysis on CDEGs was performed based on the DAVID database.⑶ The visual protein-protein interaction network(PPI)of CDEGs was acquired through STRING database and Cytoscape software,and the key genes of breast cancer were calculated using the CytoHubba plugin.⑷ The expression level of these key genes in breast cancer and normal tissues was verified based on independent datasets and GEPIA database.⑸ Prognostic analysis of key genes in breast cancer was conducted through Kaplan Meier plotter database,and survival curve was drawn.Results:⑴ The GSE15852,GSE109169,and GSE33447 gene expression datasets were screened,and a total of 355 CDEGs were identified.⑵ Further analysis showed that CDEGs were involved in a variety of biological processes and signaling pathways,among which plasma membrane,extracellular region,extracellular exosome,extracellular space,identical protein binding,PPAR signaling pathway and tyrosine metabolism were closely related to breast cancer.⑶The PPI obtained from CDEGs consists of 298 genes and 1 327 edges,and AURKA,CDK1,MCM4,TOP2A,RRM2,PRC1,HMMR,MELK,GINS2 and UHRF1 were identified as key genes of breast cancer.⑷ Three datasets and GEPIA database verify that the expression of key genes in breast cancer tissues was significantly higher than that in normal adjacent tissues.⑸ Survival analysis confirmed that OS of the patients with high expression of key genes in breast cancer was much higher than that of the patents with low expression(P<0.001).Conclusion:The key genes of breast cancer are closely related to the occurrence and prognosis of breast cancer,which provides a new idea for clinical diagnosis and improvement of the prognosis of patients with breast cancer.

关键词

乳腺癌/差异表达基因/关键基因/生存分析

Key words

Breast cancer/Differential expression genes/Key genes/Survival analysis

引用本文复制引用

基金项目

国家自然科学基金(81903186)

江西省重点研发计划(20203BBGL73202)

江西省教育厅科学技术研究项目(GJJ211544)

江西省教育厅科学技术研究项目(GJJ211545)

出版年

2024
赣南医学院学报
赣南医学院

赣南医学院学报

影响因子:0.622
ISSN:1001-5779
参考文献量1
段落导航相关论文