小样本数据下基于K-Means聚类和集成学习的混凝投药预测

Research on coagulation dosing prediction based on K-Means clustering and ensemble learning under small sample data

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：为了解决混凝投药预测过程中的小样本问题,提出基于K-Means聚类和集成学习的PAC投加量预测方法.首先,根据原水浊度和水温2个特征采用K-Means聚类将水质分为3类,利用分层抽样从3类水质数据中抽取训练集和测试集;其次,基于Bagging集成学习算法,构建由支持向量机、随机森林、Adaboost、GBDT、Catboost、XGBoost和LightGBM共7种学习器组成的PAC投加量集成预测模型(KM-Bagging);最后,以银川市某给水厂2021-2022年的运行数据为例进行验证.结果表明,KM-Bagging模型对小样本的PAC投加量具有较高预测精度,R2超过0.8,MAPE小于5％.采用6个月和9个月的日监测数据预测PAC投加量,适合数据监测时间短、精度要求不高的情况,预测结果可为原水水质发生突变时的PAC投加量调整提供参考.采用1年的日监测数据预测PAC投加量,预测精度能够满足工程应用的要求,可为水厂实际PAC投加提供辅助指导.研究结果对小样本数据下的混凝药剂投加建模与预测具有参考价值.

外文摘要：A PAC dosage prediction method was proposed to address small sample size issues in coagulant dosage prediction.The method was based on K-Means clustering and ensemble learning.Firstly,Water quality was divided into three categories using K-Means clustering based on raw water turbidity and water temperature.The training and test sets were then extracted from the data using stratified sampling.Secondly,a PAC dosage ensemble prediction model(KM-Bagging)was constructed based on the Bagging ensemble learning algorithm.The model consisted of seven learners:Support Vector Machine,Random Forest,Adaboost,Gradient Boosting Decision Tree,Catboost,XGBoost,and LightGBM.The method was validated using operational data from a water supply plant in Yinchuan City from 2021 to 2022.The results showed that the KM-Bagging model had high prediction accuracy for small sample sizes,with an R2 exceeding 0.8 and MAPE less than 5％.When 6-and 9-month daily monitoring data were used to predict PAC dosing,the model was suitable for cases where monitoring time was short and high accuracy was not required.The predicted results can be used as a reference for adjusting the PAC dosage when there was a sudden change in raw water quality.When one year of daily monitoring data was used to predict PAC dosing,the prediction accuracy met the requirements for engineering applications and provided auxiliary guidance for actual PAC dosage in water treatment plants.The results of study can provide reference value for modeling coagulant dosage prediction with small sample data.

外文关键词：

coagulation dosage predictionsmall sample dataBagging ensemble learningK-Means clustering

作者：

王世杰、李一鸣、植殷、武仁超、王涛、程紫微、郑磊、肖峰

展开 >

作者单位：

华北电力大学水利与水电工程学院,北京 102206

北京环球中科水务科技股份有限公司,北京 100085

宁夏长城水务有限责任公司,银川 750004

关键词：

混凝投药量预测小样本数据 Bagging集成学习 K-Means聚类

基金：

国家自然科学基金资助项目

项目编号：

52030003

出版年：

2024

DOI：

10.12030/j.cjee.202308113

环境工程学报

中国科学院生态环境研究中心

环境工程学报

CSTPCD北大核心

影响因子：0.804

ISSN：1673-9108

年,卷(期)：2024.18(1)

参考文献量27