首页|面向不平衡数据集的浓香型白酒基酒等级分类研究

面向不平衡数据集的浓香型白酒基酒等级分类研究

扫码查看
为解决基于气相色谱-质谱联用(GC-MS)仪采集的浓香型白酒基酒等级分类中样本不均衡导致分类模型性能下降的问题,提出了一种面向不平衡数据集的浓香型白酒基酒分类研究。该方法首先采用合成少数类过采样技术(SMOTE)对浓香型基酒样品中少数类样本进行扩充,改善样本的不均衡性;然后结合稀疏主成分分析(SPCA)对GC-MS图谱数据进行降维;最后使用深度森林(DF)分类器建立浓香型白酒基酒分类识别模型。结果表明,使用SMOTE算法对基酒数据集进行平衡之后能够有效提高模型分类准确率,所建立的浓香型基酒分类模型正确率达到96。61%,该分类模型的建立对基酒等级分类能起到一定的指导和借鉴作用。
Research on grade classification of strong-flavor Baijiu base liquor based on unbalanced data sets
In order to solve the problem of unbalanced samples which causing a decrease in the performance of classification models of base liquor of strong-flavor(Nongxiangxing)Baijiu collected by gas chromatography-mass spectrometry(GC-MS),a classification study of strong-flavor Baijiu base liquor for unbalanced data sets was proposed.In the method,a few class samples of strong-flavor Baijiu base liquor were expanded by using the syn-thetic minority over sampling technique(SMOTE)to improve the unbalanced of samples.Then the dimensions of GC-MS data were reduced by using sparse principal component analysis(SPCA).Finally,the classification and recognition model of strong-flavor Baijiu base liquor was established by using deep forest(DF)classifier.The results showed that the model classification accuracy rate could be effectively improved after using SMOTE algorithm to balance the base liquor data set,the accuracy of the established classification model for strong-flavor Baijiu base liquor reached 96.61%,and the establishment of the classification model could play a certain guidance and reference role for grade classification of base liquor.

gas chromatography-mass spectrometrystrong-flavor Baijiu base liquorsynthetic minority over-sampling techniquesparse principal component analysisbase liquor classification

王继华、李兆飞、杨壮、赵娜、张贵宇

展开 >

四川轻化工大学 人工智能四川省重点实验室,四川宜宾 644000

四川轻化工大学 自动化与信息工程学院,四川 宜宾 644000

气相色谱-质谱联用 浓香型白酒基酒 合成少数类过采样技术 稀疏主成分分析 基酒分类

四川省自贡市科技局重点科技计划四川轻化工大学科研项目四川轻化工大学研究生创新基金四川轻化工大学研究生课程建设项目

2019YYJC152020RC32Y2022150AL202213

2024

中国酿造
中国调味品协会 北京食品科学研究院

中国酿造

CSTPCD北大核心
影响因子:0.759
ISSN:0254-5071
年,卷(期):2024.43(1)
  • 26