Mining Diagnostic and Prognostic Biomarkers of Colon Cancer Based on TCGA and GEO Databases
Objective:The colon cancer gene expression data in GEO and TCGA databases to mine the key genes affecting the diagnosis and prognosis of colon cancer by bioinformatics means so as to provide biomarkers related to the early diagnosis or prognosis of colon cancer,and provide molecular basis for the clinical diagnosis of colon cancer,also to improve researching and developing new drugs for colon cancer.Methods:Retrieved the GEO gene expression database,downloaded the qualified gene chip data set of colon cancer patients and the transcriptome sequencing data of colon cancer in TCGA.The R language and corresponding R packages are used for data processing and statistical analysis.Firstly,the downloaded GEO gene chip dataset and TCGA transcriptome dataset were organized into gene expression matrix by R language.Then used the"limma"package to perform gene expression differential analysis to find the differentially expressed genes in each dataset;then used the"Robust Rank Aggregation(RRA)"package to prioritize these differentially expressed genes to find common genes in these datasets.A table of co-up-regulated and co-down-regulated genes and their prioritization according to their fold change.At the same time,the researchers performed GO and KEGG signaling pathway enrichment analysis for these co-up-regulated and down-regulated genes,and performed survival correlation analysis and protein-protein interaction analysis for the top 20 genes based on patient clinical information in TCGA.Results:The researchers downloaded the TCGA colon cancer dataset,GSE44861 dataset,GSE33113 dataset,and GSE39582 dataset,and screened out 4002,349,8917,and 1697 differential genes,and then used the RRA algorithm to get 85 co-significantly up-regulated genes and 141 co-down-regulated genes.The researchers found that these genes were involved in a variety of signaling pathways,and the fold change in tumors was as high as 100 times.Among them,a variety of down-regulated genes were involved in metal ion metabolism.In addition,the researchers also found a significant positive correlation between the expression levels of six genes GUCA2A,CA4,GCG,CXCL1,CXCL8,CLCA1 and survival time;protein-protein interaction analysis showed that the above six genes are at the core of the protein interaction network.Conclusion:CLCA1,GUCA2A,GCG,CXCL1 and CXCL8 may be related to the development and prognosis of colon cancer,and this study will provide a molecular theoretical basis for the diagnosis and prognosis of colon cancer.