Hunan University Reports Findings in Machine Learning (Systematic tracking of nitrogen sources in complex river catchments: Machine learning approach based on microbial metagenomics)

扫码查看

Abstract

New research on Machine Learning is the subject of a report. According to news reporting originating in Changsha, People’s Republic of China, by NewsRx journalists, research stated, “Tracking nitrogen pollution sources is crucial for the effective management of water quality; however, it is a challenging task due to the complex contaminative scenarios in the freshwater systems. The contaminative pattern variations can induce quick responses of aquatic microorganisms, making them sensitive indicators of pollution origins.” The news reporters obtained a quote from the research from Hunan University, “In this study, the soil and water assessment tool, accompanied by a detailed pollution source database, was used to detect the main nitrogen pollution sources in each sub-basin of the Liuyang River watershed. Thus, each sub-basin was assigned to a known class according to SWAT outputs, including point source pollution-dominated area, crop cultivation pollution-dominated area, and the septic tank pollution-dominated area. Based on these outputs, the random forest (RF) model was developed to predict the main pollution sources from different river ecosystems using a series of input variable groups (e.g., natural macroscopic characteristics, river physicochemical properties, 16S rRNA microbial taxonomic composition, microbial metagenomic data containing taxonomic and functional information, and their combination). The accuracy and the Kappa coefficient were used as the performance metrics for the RF model. Compared with the prediction performance among all the input variable groups, the prediction performance of the RF model was significantly improved using metagenomic indices as inputs. Among the metagenomic data-based models, the combination of the taxonomic information with functional information of all the species achieved the highest accuracy (0.84) and increased median Kappa coefficient (0.70). Feature importance analysis was used to identify key features that could serve as indicators for sudden pollution accidents and contribute to the overall function of the river system. The bacteria Rhabdochromatium marinum, Frankia, Actinomycetia, and Competibacteraceae were the most important species, whose mean decrease Gini indices were 0.0023, 0.0021, 0.0019, and 0.0018, respectively, although their relative abundances ranged only from 0.0004 to 0.1 %. Among the top 30 important variables, functional variables constituted more than half, demonstrating the remarkable variation in the microbial functions among sites with distinct pollution sources and the key role of functionality in predicting pollution sources. Many functional indicators related to the metabolism of Mycobacterium tuberculosis, such as K24693, K25621, K16048, and K14952, emerged as significant important factors in distinguishing nitrogen pollution origins.”

Key words

Changsha/People’s Republic of China/Asia/Cyborgs/Emerging Technologies/Machine Learning/Nitrogen

引用本文复制引用

出版年

2024

Robotics & Machine Learning Daily News

ISSN：

段落导航