首页|基于混合特征选择的中小学生近视影响因素及预测分析

基于混合特征选择的中小学生近视影响因素及预测分析

扫码查看
目的 建立近视预测模型,分析西安市新城区中小学生近视影响因素,为制定学生近视防控策略以及干预措施的实施提供科学依据.方法 基于2022年陕西省学生常见病监测项目,使用5 m标准对数视力表进行视力检查,并用台式电脑验光仪对学生眼睛进行屈光度检测.将参与视力筛查及填写调查问卷的2 511名学生纳入研究,分别使用支持向量机递归特征消除(SVM-RFE)、基于交叉验证的最小绝对收缩和选择算子回归(LASSOCV)、x2检验-SelectKBest、决策树-SelectFromModel、互信息法用于近视影响因素的筛选,将筛选出的变量分别纳入多因素logistic回归和5种分类预测模型,进行近视发生风险预测.结果 共检出近视1 780名,近视率为70.89%(1 780/2 511),男生近视率为69.24%(833/1 203)、女生近视率为72.4%(947/1 203).小学、初中、高中和职高学生近视率分别为54.69%(560/1 024)、78.96%(473/599)、84.12%(747/888).所有变量在5种特征选择方法前15名中出现3次及以上的共17个.5种特征选择方法中,5种均选择了年龄、父母是否近视;4种选择了父母是否提醒注意读写姿势,读写时胸口离桌子边沿超过一拳,参加英语、数学、写作等文化类补习班时间.logistic回归结果显示,年龄(OR=1.329,95%CI:1.286~1.373,P<0.001)、父母近视情况(父母一方近视 OR=1.808,95%C I:1.453~2.251,P<0.0001;父母均近视 OR=3.566,95%CI:2.691~4.726,P<0.001)、父母提醒注意读写姿势(OR=1.349,95%CI:1.092~1.666,P=0.006)、课间休息时在户外活动(OR=0.774,95%CI:0.636~0.943,P=0.011)、看电视时眼睛距离电视显示屏超过3 m(经常或总是 OR=0.792,95%CI:0.589~1.064,P=0.122;从不或者偶尔 OR=1.099,95%CI:0.835~1.445,P=0.501)、平均每天放学后做作业读书时间(OR=1.342,95%CI:1.105~1.631,P=0.003)是近视发生的影响因素.5种模型预测结果显示,各模型变量筛选后性能均优于变量筛选前.变量筛选后的SVM-RBF取得了最优的分类性能[受试者工作特征曲线下面积(AUC)=0.73,准确度(accuracy)=0.72,f1 值(f1-score)=0.74,精确度(precision)=0.78,召回值(recall)=0.72],其次为变量筛选后的 SVM-POLY(AUC=0.73,accuracy=0.71,f1-score=0.73,precision=0.78,recall=0.71),说明并不是纳入的变量越多、模型的预测性能越好.结论 学生近视率随着年龄、学段增长而快速增长,也与随着年级增长学生课业负担增加及使用手机等电子产品时间增加有关.
Influential factors and predictive analysis on myopia among primary and secondary school students based on hybrid feature selection
Objective To establish a myopia prediction model and analyze the factors affecting myopia among primary and secondary school students in Xincheng district,Xi'an city,so as to provide a scientific basis for the development of myopia prevention and control strategies for students as well as the implementation of intervention measures.Methods Based on the 2022 Common Disease Surveillance Program for Students in Shaanxi Province,visual acuity examination was per-formed using 5-meter standard logarithmic visual acuity chart,and dioptometry was performed on students'eyes using desktop computer optometer.A total of 2 511 students who participated in myopia screening and filled out questionnaires were included in the study.Support Vector Machine Recursive Feature Elimination(SVM-RFE),Least Absolute Shrinkage and Selection Operator Regression Based on Cross-Validation(LASSOCV),x2 test-SelectKBest,Decision Tree-SelectFrom-Model,and Mutual Information Approach were used for the screening of myopia-influencing factors,respectively,and the screened variables were incorporated into the logistic regression and 5 categorical prediction models to realize the predic-tion of the risk of myopia occurrence.Results A total of 1 780 people were detected with myopia,with a myopia rate of 70.89%(1 780/2 511),69.24%(833/1 203)for boys and 72.4%(947/1 203)for girls.The myopia rates of primary,middle,high and vocational high school students were 54.69%(560/1 024),78.96%(473/599)and 84.12%(747/888)respectively.A total of 17 variables appeared 3 or more times in the top 15 of the 5 feature selection methods.Among the 5 feature selection methods,age and whether parents were myopic were selected in all 5;whether parents reminded to pay attention to reading and writing postures,reading and writing with chest more than one fist away from the edge of the ta-ble,and attending cultural cram classes such as English,math,and writing in time were selected in 4;Logistic regression results showed that age(OR=1.329,95%CI:1.286-1.373,P<0.0001),parental myopia(father or mather myopia OR=1.808,95%CI:1.453-2.251,P<0.0001;father and mather myopia OR=3.566,95%CI:2.691-4.726,P<0.0001),parental reminder to pay attention to reading and writing postures(OR=1.349,95%CI:1.092-1.666,P=0.006),being outdoors during recess(OR=0.774,95%CI:0.636-0.943,P=0.011),watching TV with eyes more than 3 meters away from the TV display(often or always:OR=0.792,95%CI:0.589-1.064,P=0.122;Never or sometimes:OR=1.099,95%CI:0.835-1.445,P=0.501),and the average time spent doing homework and reading af-ter school each day(OR=1.342,95%CI:1.105-1.631,P=0.003)were the factors influencing the myopia.The pre-diction results of the 5 models showed that the performance of each model was better after variable screening than before variable screening.SVM-RBF after variable screening achieved the optimal classification performance(AUC=0.73,accuracy=0.72,f1-score=0.74,precision=0.78,recall=0.72),followed by SVM-POLY after variable screening(AUC=0.73,accuracy=0.71,f1-score=0.74,precision=0.78,recall=0.72).f1-score=0.73,precision=0.78,recall=0.71).This suggested that it was not the case that the more variables included,the better the predictive performance of the model.Conclusion Myopia rates among students increase rapidly with age,in addition to the cumulative effect.It is also associated with an increase in students'schoolwork burden and an increase in the amount of time spent using electronic devices such as cell phones.

Primary and secondary school studentsMyopiaRisk predictionHybrid feature selectionMachine learning

邢蒙、李红梅、张雪、王铭、李依霏

展开 >

西安市新城区疾病预防控制中心学校卫生科,陕西 710000

西安市新城区疾病预防控制中心传染病控制科

中小学生 近视 风险预测 混合特征选择 机器学习

2024

预防医学论坛
中华预防医学会

预防医学论坛

影响因子:0.645
ISSN:1672-9153
年,卷(期):2024.30(5)