太赫兹光谱结合特征选择算法分选掺混稻种
Sorting of Mixed Oryza sativa L.Seeds by Terahertz Spectrum and Feature Selection Algorithm
汪宇杰 1张傲林 1高春芳1
作者信息
- 1. 中国人民公安大学侦查学院,北京 100038
- 折叠
摘要
农业生产安全是食品安全的重要组成,粳米作为日常食用的大米,劣质掺混种子的快速检验是相关领域的重要研究工作.本研究使用太赫兹时域光谱采集220份掺混稻种和纯品稻种样本的光谱信号,通过傅里叶变换(Fourier transform,FT)对光谱数据进行预处理,将时域信号转化为频域信号作为建模数据集,对QUSET等5种模式识别模型进行分选研究.结果表明,随机森林算法(RF)、连续投影算法(SPA)、变量集群分析耦合迭代保留算法(VCPA-IRIV)等3种算法分别选择9个、6个、25个重要的特征频率,其中VCPA-IRIV作为耦合算法选择的特征频率包含的光谱信息最为丰富.为进一步优化模型,对特征频率选择后建模,在分析速度和识别精准度上显著优于全光谱建模方法,经VCPA-IRIV算法筛选的25个特征频率建立的QUEST和KNN分类对是否掺混的鉴别准确率均能达100%.变量集群分析耦合迭代保留算法能够有效地选择包含信息丰富的太赫兹光谱特征频率,能够有效提升所建立的识别模型的准确率.基于太赫兹光谱和耦合特征选择算法建立的掺混稻种识别模型快速、准确,能够为检测劣质掺混粳米种子提供新的方法.
Abstract
Agricultural production safety is an important component of food safety.The Oryza sativa subsp.japonica Kato.as a daily edible rice,rapid inspection of low-quality mixed seeds is an impor-tant research work in related fields.In this study,spectral signals of 220 samples of mixed and pure rice varieties were collected using terahertz time-domain spectroscopy,and the spectral data were pre-processed by Fourier transform(FT),and the time-domain signals were converted into frequence-do-main signals as modeling data sets.Five pattern recognition models such as QUSET were compared for sorting research.The results show that random forest(RF)algorithm,successive projections al-gorithm(SPA),variable combination population analysis-iteratively retaining information variables algorithm(VCPA-IRIV)were selected,and the three algorithms selected 9,6 and 25 important fea-ture frequencies respectively,in which VCPA-IRIV as the characteristic frequency selected by the coupling algorithm contained the most abundant spectral information.In order to further optimize the model,the modeling after characteristic frequency selection was significantly superior to the full-spec-trum modeling method in terms of analysis speed and recognition accuracy.The QUEST and KNN classification based on 25 characteristic frequencies screened by the VCPA-IRIV algorithm could both had 100%identification accuracy.The variable cluster analysis coupled iterative retention algorithm could effectively select the characteristic frequency of terahertz spectrum containing rich information,and could effectively improve the accuracy of the established recognition model.The identification model based on terahertz spectrum and coupled feature selection algorithm was fast and accurate,and could be used for detecting poor quality Oryza sativa subsp.japonica Kato.seeds to offer a new ap-proach.
关键词
粳稻种子/掺混/太赫兹时域光谱/耦合特征选择算法/模式识别Key words
Oryza sativa subsp.japonica Kato.seeds/mixed/terahertz time-domain spectroscopy/coupled feature selection algorithm/pattern recognition引用本文复制引用
基金项目
中国人民公安大学刑事科学技术双一流创新研究专项(2023SYL06)
出版年
2024