For cross-modal person Re-IDentification(Re-ID)in visible-infrared images,methods using modality conversion and adversarial networks yield associative information between modalities.However,these approaches fall short in effective feature recognition.Thus,a two-stage approach using visual-text matching and graph embedding for enhanced re-identification effectiveness is proposed in this paper.A context-optimized scheme is utilized by the method to construct learnable text templates that generate person descriptions as associative information between modalities.Specifically,in the first stage,unified text descriptions of the same person across different modalities are utilized as prior information,assisting in the reduction of modality differences,based on the Contrastive Language-Image Pre-training(CLIP)model.Meanwhile,in the second stage,a cross-modal constraint framework based on graph embedding is applied,and a modality-adaptive loss function is designed,aiming to improve person recognition accuracy.The method's efficacy has been confirmed through extensive experiments on the SYSU-MM01 and RegDB datasets,with a Rank-1 accuracy of 64.2%and mean Average Precision(mAP)of 60.2%on SYSU-MM01 being achieved,thereby demonstrating significant improvements in cross-modal person re-identification.
关键词
行人重识别/跨模态/图片-文本对的预训练模型/上下文优化/图嵌入
Key words
Person Re-IDentification(Re-ID)/Cross-modal/Contrastive Language-Image Pre-training(CLIP)model/Context optimization/Graph embedding