Diversified Data Expansion Method Using Text-Based Person Image Search
In recent years,Text-Based Person Search(TBPS)technology has gained significant importance in security and criminal investigations.However,existing datasets are often constrained by limited person images and simplistic text descriptions,which hinders the model's ability to capture diverse person features and restricts the advancement of TBPS technology.To address this issue,we propose a method for enhancing the diversity of person text-image pair data generation and selection.In the data generation phase,person text descriptions are generated using a constituency parsing analysis model in conjunction with large language models,followed by the generation of corresponding person images through conditional image generation models.During the image filtering stage,the PickScore scoring function evaluates the similarity between generated person images and their corresponding text descriptions,filtering out low-scoring pairs.In the person text-image pair data filtering stage,multimodal large models assess the matching probability between person images and text descriptions,discarding pairs that fall below a predefined threshold.The remaining high-quality pairs are then incorporated into existing datasets as positive samples.Experiments conducted on various public TBPS datasets demonstrate notable improvements in benchmark models across Rank-k and mean Average Precision(mAP)metrics after applying this method for dataset augmentation.Furthermore,we explore the impact of posture and style control on the augmentation results,providing valuable insights for future research.
diversified person data expansionconstituency parsing analysis modellarge language modelconditional image generative modelmultimodal large model