Classification of tobacco leaf formula modules based on a combination of multiple methodological advantages
To deeply study grouping methods and principles of cigarette brand tobacco leaf formulas,based on the 457 tobacco samples of 12 kinds of sensory evaluation indexes,this study compared four discriminant analysis and four machine learning methods for modeling set accuracy(R),validation set accuracy(r),and average accuracy(m)across four tobacco leaf classifications.A high-precision composite classification method was constructed based on method selection and weight allocation.Results showed that:(1)Compared with discriminant analysis,machine learning significantly improved R,while r significantly decreased,with LS-SVM having the highest R(92.8%),and FDA and F-BDA having the highest r(80.2%),but there was no significant difference in m;(2)Optimized selection of M-BDA,FDA,ANN,and KNN methods and the composite classification method established by accuracy weighting simultaneously improved R(95.3%)and r(89.0%),and increased m from below 84%to 92.2%,validating the general effectiveness of the composite classification method through theoretical calculations and practical results;(3)The Kappa coefficient of the composite classification method was greater than 0.8,indicating reliability,high consistency,and a significant improvement in validation set m-F1 measure by 21.2%,greatly enhancing the model's generalization ability;(4)Five indicators,namely elegance,off-flavor,aftertaste,moistness,and clarity,played a major role in classification,aligning with the style characteristics of the Liqun brand;(5)Misjudged samples(6.5%)with indicator scores not matching their real module categories were attributed to the balance of stock,cost,and quality,generally conforming to the adjustment space of tobacco leaf formulas.