首页|中红外光谱双重校验异常样本检测方法研究

中红外光谱双重校验异常样本检测方法研究

扫码查看
中红外吸收光谱法目前最有前途的无创血糖检测技术之一.中红外吸收光谱的血糖浓度检测结果准确性与光谱信号的可靠性密切相关.采集中红外光谱信号的过程易受环境或人为等因素的影响而产生包含大量干扰信息的异常光谱.异常样本存在会降低预测模型的有效性和可靠性,故异常样本的检测和剔除至关重要.本研究提出双重校验异常样本检测法能够将异常样本准确筛选出来并剔除.本算法分为两个阶段,首先利用蒙特卡洛交叉验证异常样本检测,初步筛选异常样本,提高光谱样本集的稳定性;其次以马氏距离平方近似服从卡方分布为理论基础,自适应确定最优阈值,对剩余数据集进行异常样本再识别.以64份包含葡萄糖、白蛋白、尿素、乳酸、果糖、胆固醇在内的葡萄糖混合仿体溶液样本为研究对象.双重校验法首先利用预测误差平方和对异常样本敏感的特性对光谱数据集中的异常样本进行初步判定,共检测出3个异常样本,从光谱数据集中剔除检测出的异常样本后建立PLS校正模型,该模型的相关系数为0.91,RMSECV为60.17 mg·dL-1.其次,双重校验法以马氏距离平方近似服从卡方分布为理论基础,实现异常样本自适应识别.共检测出了 12个异常样本,剔除全部异常样本后构建的PLS模型性能得到了提升,相关系数达到0.99,RMSECV达到57.77 mg·dL-1.通过与无异常样本剔除、PCA-MD法、蒙特卡洛法相比较双重校验法结果最优,证明了本算法在异常样本检测上的优越性.与未剔除异常样本时的PLS模型相比,相关系数从0.86上升到了 0.99,RMSECV从67.51 mg·dL-1降低至57.77 mg·dL-1,分别提升了15.12%、14.42%.本研究针对异常样本检测方法中易受阈值影响而出现正常样本误检或异常样本漏检的问题给出了很好的解决策略,该方法能够准确检测并剔除异常样本,进而提高预测模型的精度和预测性能.为中红外光谱数据集异常样本的准确剔除提供了一种思路.
Research on the Twin Check Abnormal Sample Detection Method of Mid-Infrared Spectroscopy
Mid-infrared absorption spectroscopy is one of the most promising non-invasive blood glucose measurement techniques.The accuracy of blood glucose concentration measurement results of the mid-infrared absorption spectrum is closely related to the reliability of spectral signals.However,collecting mid-infrared spectral signals is susceptible to environmental or human factors,and an anomaly spectrum containing a large amount of interference information will be generated.The existence of an anomaly spectrum will reduce the effectiveness and reliability of the prediction model,so the detection and removal of abnormal samples are crucial.This study proposes that the twin check abnormal sample detection method can accurately screen and eliminate abnormal samples.This algorithm is divided into two stages.Firstly,the Monte Carlo cross-validation abnormal sample detection method is used to preliminarily screen abnormal samples and improve the stability of the spectral sample set.Secondly,based on the theory that Mahalanobis distance square approximately obeys chi-square distribution,the optimal threshold is adaptively determined,and the remaining data sets are re-identified with abnormal samples.64 samples of the glucose-mixed imitated solution containing glucose,albumin,urea,lactic acid,fructose and cholesterol were studied.The twin check method first uses the characteristic that the sum of squared prediction errors is sensitive to abnormal samples to make a preliminary judgment on the abnormal samples in the spectral data set,and a total of 3 abnormal samples are detected.The PLS correction model is established after removing the abnormal samples from the spectral data set.The correlation coefficient of this model is 0.91,and RMSECV is 60.17 mg·dL-1.Secondly,the twin check method is based on the theory of Mahalanobis distance square approximately conforming to chi-square distribution,which realizes the adaptive identification of abnormal samples.A total of 12 abnormal samples were detected.The performance of the PLS model constructed after removing all abnormal samples was improved,with the correlation coefficient reaching 0.99 and RMSECV reaching 57.77 mg·dL-1.By comparing the results of the twin check method with the non-abnormal sample removal,PCA-MD method and Monte Carlo method,the superiority of this algorithm in abnormal sample detection is proved.Compared with the PLS model without removing abnormal samples,the correlation coefficient increased from 0.86 to 0.99,and RMSECV decreased from 67.51 to 57.77 mg·dL-1,increasing by 15.12%and 14.42%,respectively.This study provides a good solution strategy for the problem of false detection of normal samples or missing detection of abnormal samples due to the easy influence of threshold of existing abnormal sample detection methods,which is conducive to the method's accurate detection and elimination of abnormal samples,thus improving the accuracy and prediction performance of the prediction model.This method provides a way to eliminate the abnormal samples of mid-infrared absorption spectrum accurately.

Infrared spectroscopyAbnormal sampleTwin checkAdaptive threshold

张朱珊莹、张若静、顾瀚文、谢勤岚、张献文、撒继铭、刘繄

展开 >

中南民族大学生物医学工程学院,湖北武汉 430074

认知科学国家民委重点实验室,湖北武汉 430074

医学信息分析及肿瘤诊疗湖北省重点实验室,湖北武汉 430074

武汉理工大学信息工程学院,湖北武汉 430070

临沂格莱普园林机械有限公司,山东临沂 276700

武汉理工大学机电工程学院,湖北武汉 430070

展开 >

红外光谱 异常样本 双重校验 自适应阈值

国家自然科学基金国家自然科学基金

6150152661178087

2024

光谱学与光谱分析
中国光学学会

光谱学与光谱分析

CSTPCD北大核心
影响因子:0.897
ISSN:1000-0593
年,卷(期):2024.44(6)
  • 15