基于特征融合的可视-红外行人重识别算法
Multi-modal Feature Fusion for Visible-Infrared Person Re-Identification
申汶山 1王洁 1黄琴2
作者信息
- 1. 山西师范大学数学与计算机科学学院,山西太原 030031
- 2. 山西大学大数据科学与产业研究院,山西太原 030000
- 折叠
摘要
由于智能监控设备的快速发展和部署,产生的海量监控数据难以通过传统人力处理.加之近年来RGB-IR双模相机的广泛应用,红外监控视频数据得以利用来辅助相关视觉任务.为了更准确地在可视-红外两种模态监控视频中检索相同行人,提出了基于特征融合的可视-红外行人重识别算法.该算法首先设计了基于Transformer方法的特征提取器,从两种模态数据中生成具有判别力的特征.然后考虑对两种模态互补信息的使用,提出双向多模态注意力方法对齐不同模态的特征,并同时融合互补的语义信息,最终通过分类器进行分类识别.在公开数据集进行实验表明,所提算法相对于目前大多数已有算法具有更好的泛化能力和鲁棒性,在SYSU-MM01数据集上的预测精度达到99.86%,在LLCM数据集上的预测准确率达到94.13%.
Abstract
Due to the rapid development and deployment of intelligent monitoring devices,the massive mo-nitoring data generated is difficult to process throughhuman efforts.In addition,with the wide application of RGB-IR dual-mode cameras in recent years,infrared surveillancedata has been utilized to assist related computer visual tasks.In order to more accurately retrievethe same pedestrians in visible-infrared dual-modal surveillance videos,a visual-infrared person re-identification algorithm based on feature fusion is proposed.The algorithm first designs a Transformer-basedfeature extractorto generate discriminative features from two modalities of ima-ges.Then,considering the use of complementary information between the two modalities,a bidirectional multi-modal attention method is proposed to align the features of different modalities and simultaneously fuse comple-mentary semantic information.Experiments on public datasets have shown that the proposed algorithm has better generalization ability and robustness compared to most existing algorithms.The prediction accuracy on the SYSU-MM01 dataset reaches 99.86%,and the prediction accuracy on the LLCM dataset reaches 94.13%.
关键词
可视-红外行人重识别/深度学习/跨模态学习/计算机视觉Key words
visible-infrared person ReID/deep learning/cross-modal learning/computer vision引用本文复制引用
出版年
2024