Text Guided Clothing Image Retrieval Based on Multi-granularity Matching
Text guided image retrieval integrates query images and text conditions into a multimodal query.Existing methods improve performance by constructing more advanced fine-grained metric learning,but this may cause the model to overfit the target image under imprecise text conditions and make the retrieval results feature monotonous.To address this issue,we propose a text guided clothing image retrieval method based on feature enhancement and multi granularity matching.Firstly,based on the distribution of target features,noise following a normal distribution is generated,causing small intra-class jittering.Then,constraints are imposed on the enhanced features based on the fluctuations of the target features.The larger the fluctuations,the greater the penalty for the enhanced features,resulting in coarse-grained matching losses.Finally,we optimize the learning strategy by using dynamic weights that continuously decay with training iterations to unify coarse-grained and fine-grained losses.The proposed method reduces the model's rejection of potential target images and improves the diversity of feature recognition.Extensive experiments on two publicly available clothing datasets,FashionIQ and Shoes,have shown that the proposed method can improve recall rates and provide richer retrieval results.
text guidedimage retrievalfeature enhancementmulti-granularity matchingmulti-modal fusion