Flora analysis based on Dirichlet polynomial process model and K-means
Population typing is an effective method to better understand complex biological problems such as human physical and mental health.Clustering is a method to define intestinal type in order to reduce complexity by grouping samples.However,the selection of K value of traditional K-means clustering algorithm cannot be determined.This paper improves the traditional K-means clustering algorithm and verifies it on the public dataset,The experimental results show that the improved algorithm can solve the problem of undetermined K value selection,and the stability,accuracy and quality of clustering results are significantly improved.Applying the improved model to the OTUs data of intestinal flora,it is found that it can not only effectively distinguish the similarities between samples of patients with type 2 diabetes,but also identify the OTUs bacteria that have the greatest impact on the heterogeneity of flora structure,providing a new perspective for clinical solutions to the problem of type 2 diabetes.
K-means algorithmDirichlet process mixed modelFlora analysisPopulation typingClustering