In order to solve the problem of being unable to reveal complex biological processes and disease mechanisms using only a single biological data,proposed a disease-causing gene prediction method,DGPMIF,adopting a multi-information fusion strategy.Firstly,a heterogeneous network with disease-phenotype,disease-gene,protein-protein and gene-ontology associations was constructed.The network embedding algorithm was used to extract the low-dimensional vector representation of the nodes in the heterogeneous network.At the same time,the network topology algorithm was combined to extract network structural characteristics.Secondly,the cosine similarity algorithm was used to measure the similarity of node vectors and predict the relationship between diseases and genes.Finally,the effectiveness of the DGPMIF method was verified through case studies of specific diseases and comparison with classic disease-causing gene prediction methods.The results show that different types of associated data play an important role in enhancing the prediction performance of disease-causing genes,and the predictive performance of disease-causing gene prediction is improved through multi-level information fusion.DGPMIF prediction method can efficiently mine the information contained in the network,and has important reference value for prediction research on gene association of related diseases.
关键词
人工智能其他学科/致病基因/异构网络/信息融合/网络嵌入/网络结构特征
Key words
other disciplines of artificial intelligence/disease-causing genes/heterogeneous network/information fusion/network embedding/network structural characteristics