基于文本的行人图像检索的多样化数据扩充方法

Diversified Data Expansion Method Using Text-Based Person Image Search

扫码查看

原文链接

维普
万方数据

中文摘要：近年来,基于文本的行人图像检索(TBPS)技术在安防和刑侦等领域发挥着越来越重要的作用.然而,现有数据集中行人图像较少且描述行人的文本较为单调导致模型无法充分学习行人特征和信息,限制了 TBPS检索技术的进一步发展.为了解决这一问题,提出一种多样化行人图像-文本对数据生成与筛选的扩充方法.在数据生成阶段,首先使用成分句法分析模型和大语言模型相结合的方式生成行人文本描述,然后使用条件图像生成模型根据生成的行人文本描述产生相应的行人图像.在依据行人文本筛选图像阶段,利用评分函数PickScore计算生成的行人图像与行人文本描述之间的相似度分数,根据计算的相似度分数的结果,粗粒度地筛掉相似度分数较低的行人图像,只保留相似度分数较高的行人图像与行人文本描述.在行人图像-文本对数据过滤阶段,利用图文多模态大模型计算行人图像与行人文本描述的匹配概率,筛掉概率低于阈值的行人图像-文本对进行细粒度的数据过滤,并将剩余的高质量行人图像-文本对作为正样本对添加到现有数据集中.在多个公开的TBPS检索数据集上的实验结果表明,应用该方法对这些数据集进行扩充后,不同检索基准模型的Rank-k、平均精度均值(mAP)等指标均有较大幅度的提升.此外,探讨了姿态控制与风格控制对扩充结果的影响,为后续更深入的研究提供了一种思路.

外文摘要：In recent years,Text-Based Person Search(TBPS)technology has gained significant importance in security and criminal investigations.However,existing datasets are often constrained by limited person images and simplistic text descriptions,which hinders the model's ability to capture diverse person features and restricts the advancement of TBPS technology.To address this issue,we propose a method for enhancing the diversity of person text-image pair data generation and selection.In the data generation phase,person text descriptions are generated using a constituency parsing analysis model in conjunction with large language models,followed by the generation of corresponding person images through conditional image generation models.During the image filtering stage,the PickScore scoring function evaluates the similarity between generated person images and their corresponding text descriptions,filtering out low-scoring pairs.In the person text-image pair data filtering stage,multimodal large models assess the matching probability between person images and text descriptions,discarding pairs that fall below a predefined threshold.The remaining high-quality pairs are then incorporated into existing datasets as positive samples.Experiments conducted on various public TBPS datasets demonstrate notable improvements in benchmark models across Rank-k and mean Average Precision(mAP)metrics after applying this method for dataset augmentation.Furthermore,we explore the impact of posture and style control on the augmentation results,providing valuable insights for future research.

外文关键词：

diversified person data expansionconstituency parsing analysis modellarge language modelconditional image generative modelmultimodal large model

作者：

王靖尧、曹敏

展开 >

作者单位：

苏州大学计算机科学与技术学院,江苏苏州 215000

关键词：

多样化行人数据扩充成分句法分析模型大语言模型条件图像生成模型多模态大模型

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0068883

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(12)