首页|基于筛选后主要化学成分对同类植物的品种分类研究

基于筛选后主要化学成分对同类植物的品种分类研究

扫码查看
同类植物的品种会有独特的多样性,可能会导致极其不同的应用差异.为了探索同类植物不同品种之间的分类方法,我们以4组不同品种烟草为研究对象进行分类.基于烟草的化学成分数据,我们分别比较了最大相关最小冗余法、浮动后退法、遗传算法、随机森林4种变量筛选方法在支持向量机算法上的建模效果.结果 表明浮动后退法具有最好的分类准确率.脯氨酸,钾含量,芸香苷,柠檬酸,pH值是4个变量筛选集合的交集,具有很大的潜力应用于烟草的分类问题.这组方法也可能适用于其他植物的应用研究.
Classification for the varieties of single plant based on important chemical compositions by feature selections
Generally,the varieties of the same kind of plant render distinctive diversities that might lead to extremely different applications.To explore a potential method of classification for the varieties of one single plant,four groups of tobacco have been taken as an example to classify the different varieties.When classifying these four groups of tobacco varieties based on the dataset of chemical compositions,four feature selections,which are max-relevance-min-redundancy (mRMR),sequential backward floating selection (SBFS),genetic algorithm (GA),random forest (RF) have been employed.The result shows that SBFS is the best method,due to the highest accuracy in SVM models.Five features have been selected from the intersection of four algorithms.These four features are proline acid,potassium content,rutin,citric acid and pH,which might be the most important chemical compositions related to varieties of tobacco potential.The set of method might be possible to be applied into other plants' applications.

Feature selections, Chemical compositionsGenetic algorithmmax-relevance-min-redundancy

沙云菲、王亮、刘太昂、于洁、葛炯、李敏杰、陆文聪、孙翔

展开 >

上海烟草集团有限责任公司技术中心,上海市,200082

上海帆阳信息科技有限公司,上海市,200444

上海大学理学院化学系,上海市,200444

变量选择 化学成分 遗传算法 最大相关最小冗余

中国烟草总公司科技重大项目国家重点研究开发计划China National Tobacco Corporation Science and Technology Major ProjectNational Key Research and Development Program of China

Zhong Yan Ban [2016] 2592016YFB070504Zhong Yan Ban [2016] 2592016YFB0700504

2019

计算机与应用化学
中国科学院过程工程研究所

计算机与应用化学

CSTPCD北大核心
影响因子:0.386
ISSN:1001-4160
年,卷(期):2019.36(5)
  • 2
  • 21