Research on the application of integrated learning in predicting inorganic arsenic content in aquatic products
Objective Chronic exposure of inorganic arsenic in aquatic products is harmful to human health,and its detection time cost is high.For the purpose of realizing the rapid prediction of inorganic arsenic content,an integrated learning model of inorganic arsenic content prediction based on small sample size and characteristic quantity was established.Methods Data on heavy metals in aquatic products in Qinhuangdao from 2018 to 2022 was collected.Pearson correlation coefficient method was used to analyze the correlation of lead,cadmium,mercury and inorganic arsenic,and to examine the multicollinearity.The stepwise regression vector combination method was used to test the goodness of fit(R2)and mean square error(MSE)of gradient boosting regressor(GBR)and random forest model(RF)under different feature combinations in order to screen the optimal combination.The feasibility of 5 integrated learning algorithm models in model evaluation index,recovery rate and target risk coefficient(THQ)was compared.Results There was a weak correlation between the four elements and no multicollinearity exists.The R2 values of RF and BGR algorithm were 89.9%and 93.3%,respectively.R2 of extreme learning tree(ET)model in shellfish,fish,crab and shrimp were 100.00%,99.42%,100.00%,99.92%,respectively,and the box plots showed the smallest deviation in prediction errors.The outliers were within the acceptable range,the results of dietary risk assessed by THQ index were consistent before and after prediction.Conclusion The method proposed in this study is aimed at rapid prediction of inorganic arsenic elements with small sample sizes and characteristic sizes,and can provide a low-cost and high-efficiency method for the early-warning of food safety risk.