Using text data mining techniques for understanding the p53 gene expression regulatory network
To study the relationship between p53 gene and its downstream/target genes in order to understand p53 network, text data mining method is used and noncommercial software written in Perl 5. 10 is used to mine the database from PubMed about p53 gene and human gene ontology, and the p53 network is constructed by linkage clustering analysis. Results show that the frequency distribution of the objective gene with the gene ontology of all the text has a certain correlation, which indicates that the proportion of the low-frequency genes is significantly lower than the high-frequency genes in text data mining. This has allowed us to demonstrate that the distributions of genes in the p53 network have a greater relationship with the frequency of the genes, and meanwhile, text size has an important influence on the accuracy of the text data mining.
text data miningp53 genegene ontologylinkage clustering analysis