摘要
采用2018年世界卫生组织在中国开展的成人烟草调查数据,对成人吸烟行为影响因素进行探究.首先对原始数据做数据清洗,包括剔除无关变量、组合新变量等步骤.其次结合卡方检验、方差分析以及最大互信息数对处理后的数据集进行特征选择.再次基于XGBoost、LightGBM算法进行建模,对影响成人吸烟行为的因素进行排序和分析.最后基于表现较好的LightGBM模型进行变量组合建模,进一步挖掘吸烟者特征.经建模分析,识别得出成人性别、烟草环境、增税态度、低焦油烟认知、学历、年龄重要性由强至弱对吸烟行为产生影响.
Abstract
Using the adult tobacco survey data conducted by the World Health Organization in China in 2018,this study explores the influencing factors of adult smoking behavior.Firstly,perform data cleaning on the original data,including removing irrelevant variables,combining new variables,and other steps.Secondly,feature selection is performed on the processed dataset by combining Chi-square test,analysis of variance,and Maximal Information Coefficient(MIC).Then,it conducts modeling based on XGBoost and LightGBM algorithms,sorting and analyzing the factors affecting adult smoking behavior.Finally,based on the well performing LightGBM model,variable combination modeling is performed to further explore the characteristics of smokers.Through modeling and analysis,it is identified that adult gender,tobacco environment,attitude towards value-added tax,low tar smoke awareness,educational background,and age importance have a varying impact from strong to weak on smoking behavior.
基金项目
云南省烟草公司文山州公司科技一般项目(20235326002)