Development and preliminary verification of lung cancer diagnostic marker genes based on the joint analysis of multiple chips
Objective:To screen out a group of genes closely related to the occurrence of lung cancer through GEO multi-chip combined analysis,as a key marker gene for predicting lung cancer,and conduct preliminary verification.Methods:Download the GSE89047,GSE108055 and GSE116959 lung cancer expression datasets from GEO database and merge them.The sva program package ComBat in the R language corrects the batch effect,and the limma program package performs gene differential expression analysis to screen out lung cancer differentially expressed genes.String database combined with Cytoscape 3.8.2 software to construct a differentially expressed gene protein-protein interaction network and analyze core genes.The ROC method was used to verify the predictive effect of lung cancer differential genes and core genes on the diag-nosis of lung cancer.TIMER database was used to analyze the relationship between GPM6A gene expression and copy number variation and immune cell infiltration.Results:Based on the multi-chip combined analysis of the GEO database GSE89047,GSE108055 and GSE116959 lung cancer expression datasets,938 differentially expressed genes between lung cancer tissues and normal lung tissues were screened and sorted by the corrected P value.The TOP 10 differential genes were GPM6A,WNT3A,SLC6A4,TMEM100,TCF21,BTNL9,HSPA12B,LIMS2,VGLL3 and ITLN2.The 10 core genes analyzed by String database combined with Cytoscape 3.8.2 software are CCNA2,CCNB1,CENPE,FOXM1,ITGAM,KIF11,KIF20A,KIF23,KIF2C and MMP9.ROC analysis showed that the AUC(95%CI)of GPM6A was 0.948(0.874-0.986);the AUC(95%CI)of the TOP10 differential genes was 0.961(0.886-0.992);the AUC(95%CI)of the 10 core genes was 0.830(0.722-0.895),indicating that the marker genes selected in this study have good lung cancer prediction ability.TIMER analysis showed that GPM6A expres-sion correlated highest with macrophage infiltration in both lung adenocarcinoma and lung squamous carcinoma(lung adenocar-cinoma:r=0.347,P<0.001;lung squamous carcinoma:r=0.425,P<0.001),GPM6A gene copy number variation correlated with immune infiltration of B cells,CD4+T cells,macrophages,neutrophils and dendritic cells in lung adenocarcinoma(P<0.05),and GPM6A gene copy number variation correlated with immune(P<0.05),and GPM6A gene copy number variants were highly correlated with immune infiltration of B cells,CD8+T cells,CD4+T cells,macrophages,neutrophils and dendritic cells in lung squamous carcinoma(P<0.05).Conclusion:In this study,we initially developed and verified some marker genes with better predictive ability for lung cancer diagnosis through multi-chip combined analysis,and found that the most significant difference marker gene GPM6A is closely related to immune cell infiltration.