首页|基于长短期时间关系网络的视频行人重识别

基于长短期时间关系网络的视频行人重识别

扫码查看
行人重识别是计算机视觉领域中的一个重要研究方向,其目的是在不同的监控摄像头中识别并跟踪同一行人.由于视频帧间存在多种时间关系,从这些关系中可以获取到对象的运动模式以及细粒度特征,因此视频重识别相比图像重识别拥有更丰富的时空线索,也更接近实际应用.问题的关键是如何挖掘这些时空线索作为视频重识别的特征.本文针对视频行人重识别问题,提出了一种基于Transformer的长短期时间关系网络(Long and Short Time Transformer,LSTT).该网络包含长短期时间关系模块,提取重要时序信息并强化特征表示.长期时间关系模块利用记忆线索存储每帧信息,并在每一帧建立全局联系;短期时间关系模块则考虑相邻帧之间交互,学习细粒度目标信息,提高特征表示能力.此外,为了提高模型对不同目标特征的适配性,本文还设计了一个包含不同规格卷积核的多尺度模块.该模块具有多种卷积感受野,能够更全面覆盖目标区域,从而进一步提高模型的泛化性能.在MARS、MARS_DL和iLIDS-VID 3个数据集上的实验结果表明,LSTT模型性能最优.
Video-Based Person Re-Identification Using Long-Short Term Temporal Relationship Network
Person re-identification is an important research direction in the field of computer vision,aiming to identi-fy and track the same person across different surveillance cameras.Compared with image-based re-identification methods,the video-based re-identification method has richer temporal and spatial information,making it more efficient in real-world applications.Due to the existence of various temporal relationships between video frames,valuable information such as mo-tion patterns and fine-grained features can be obtained.Therefore,how to effectively extract these temporal and spatial clues has become a key issue in video-based re-identification.In this paper,a long and short time Transformer(LSTT)network based on a temporal relationship is proposed to address the video-based person re-identification problem.The module in-cludes long and short term relationship modules to extract important temporal information and enhance feature representa-tion.The long-term relationship module stores information for each frame using a memory cue and establishes global con-nections for each video frame.The short-term relationship module considers interaction between adjacent frames to learn fine-grained target information and improve feature representation.Additionally,to improve the model's adaptability to dif-ferent target features,a multi-scale module with convolution kernels of different sizes is designed.The module has multiple convolution receptive fields and can more comprehensively cover the target area,further improving the model's generaliza-tion performance.Experimental results on three datasets,namely MARS,MARS_DL,and iLIDS-VID,demonstrate that the LSTT model achieves state-of-the-art performance.

video-based person re-identificationTransformerthe long-term temporal relationshipthe short-term temporal relationshipmulti-scale module

何智敏、钱江波、严迪群、叶绪伦、王翀

展开 >

宁波大学信息科学与工程学院,浙江宁波 315211

浙江移动网络应用技术重点实验室,浙江宁波 315211

视频行人重识别 Transformer 长期时间关系 短期时间关系 多尺度

国家自然科学基金宁波市科技项目宁波市科技项目

622712742024Z0042023Z059

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(8)