首页|基于荧光光镊与机器学习的单细胞血液分类方法

基于荧光光镊与机器学习的单细胞血液分类方法

扫码查看
利用物种间血液成分的差异来识别物种,对生物医学、医疗健康、海关、刑侦、食品安全、野生动物保护等工作十分重要.但目前的研究都是针对群体细胞展开,忽略了单细胞的异质性,开展基于单细胞的血液光谱分类方法研究非常迫切.在此提出了一种基于荧光光镊和机器学习的单细胞血液分类方法,利用光镊实现了单细胞捕获,通过荧光光谱检测系统获得了单细胞荧光光谱数据,并基于机器学习方法实现了准确分类.首先,设计并搭建了一套荧光光镊系统,实现了单细胞捕获和荧光光谱检测.然后,制备了马、猪、犬、鸡四种动物的红细胞稀释液,以440 nm激光作为荧光激发光源,获得了四个物种每种100条、共计400条荧光光谱数据,并进行了背景去除、平滑、归一化的预处理,消除了信号中的噪声干扰.随后,建立了随机森林分类模型,分析了当抽取特征数k=20时,模型中树的棵数与预测准确率之间的关系,当决策树m=500时,分类正确率趋于稳定,有很高的分类正确率和运行效率.进一步地,设定样本数据的30%作为测试集、70%为训练集,计算不同波长与特征重要性之间的关系,得到了 10个分类准确率,并取平均值作为模型分类的准确率,测试集最终准确率达到93.1%,方差为0.31%.最后,计算了混淆矩阵,对模型预测精度进行了评价,鸡的分类正确率最高,马的分类正确率最低.分析表明,对分类有重要贡献的物质分别是卟啉类物质、血红素和黄素腺嘌呤二核苷酸.总之,研究表明,将荧光光镊与机器学习方法相结合,可实现单细胞水平的血液分类,较高的分类正确率验证了这种方法的可行性和有效性.同时,该方法不需要过多样品就能满足建模需求,避免了因浓度低带来的荧光自吸收强度过低等问题,具有快速、准确分类的优点,具有非常重要的潜在应用价值.
Single-Cell Blood Classification Method Based on Fluorescence Optical Tweezers and Machine Learning
It is very important to use the differences in blood components between species to identify species in biomedicine,medical health,customs,criminal investigation,food safety,wildlife protection and so on.However,the current research is carried out on population cells,ignoring the heterogeneity of single cells.Therefore,it is very urgent to develop a single-cell-based blood fluorescence spectral classification method.A single-cell blood classification method is proposed based on fluorescence optical tweezers and machine learning.The optical tweezers are used to achieve single-cell capture,and the single-cell fluorescence spectrum data is obtained through the fluorescence spectrum detection system.The accurate classification is realized based on the machine learning method.First,a fluorescent optical tweezers system was designed and built to realize single-cell capture,fluorescence imaging and spectral detection were obtained.Then,the whole blood solutions of horses,pigs,dogs and chickens were prepared,and using 440 nm laser light as the fluorescence excitation light source,100 pieces of fluorescence spectrum data for each of 4 species,including horse,pig,dog and chicken,totalling 400 pieces of fluorescence spectrum data were obtained,and the preprocessing of background removal,smoothing and normalization was carried out to eliminate instrument noise and environmental interference in the signal.Subsequently,a classification model of the random forest was established,and the relationship between the number of trees in the model and the prediction accuracy was analyzed when the number of extracted features k=20,and it was found that when the decision tree was m=500,the classification accuracy tended to be stable,and at the same time obtaining a high classification accuracy and operating efficiency.Further,30%of the sample data was set as the test set and the rest as the training set.The relationship between different wavelengths and feature importance was calculated,10 classification accuracy rates were obtained,and the average as the model classification accuracy rate was taken.The final average accuracy rate of the test set reaches 93.1%,and the variance is 0.31%.Finally,the confusion matrix was calculated,and the model's prediction accuracy was evaluated.Chickens had the highest classification accuracy,and horses had the lowest accuracy.The analysis showed that the important contributions to the classification were porphyrins,heme and flavin adenine dinucleotide.In conclusion,the study shows that the combination of fluorescent optical tweezers and machine learning methods can achieve blood classification at the single-cell level,and the high classification accuracy validates the feasibility and efficiency of the optical tweezers-based single-cell fluorescence spectroscopy detection method.At the same time,this method can meet the modeling needs without too many samples and can avoid problems such as low fluorescence self-absorption intensity caused by low concentration.It has the advantages of fast and accurate classification and has very important potential application value.

Blood classificationFluorescence optical tweezersMachine learningSingle cellRandom forest classification model

周哲海、熊涛、赵爽、张帆、朱桂贤

展开 >

北京信息科技大学光电测试技术及仪器教育部重点实验室,北京 100192

血液分类 荧光光镊 机器学习 单细胞 随机森林分类模型

国家自然科学基金北京市长城学者支持计划北京市青年拔尖人才支持计划

61875237CIT&TCD20190323Z2019042

2024

光谱学与光谱分析
中国光学学会

光谱学与光谱分析

CSTPCD北大核心
影响因子:0.897
ISSN:1000-0593
年,卷(期):2024.44(4)
  • 16