基于Vision Transformer与迁移学习的裤装廓形识别与分类

Recognition and classification of trouser silhouettes based on Vision Transformer and migration learning

扫码查看

原文链接

维普
万方数据

中文摘要：针对裤装廓形识别与分类模型的分类不准确问题,文章采用带有自注意力机制的Vision Transformer模型实现裤装廓形图像的分类,对于图片背景等无关信息对廓形识别的干扰,添加自注意力机制,增强有用特征通道.为防止因裤型样本数据集较少产生过拟合问题,可通过迁移学习方法对阔腿裤、喇叭裤、紧身裤、哈伦裤4 种裤装廓形进行训练和验证,将改进的Vision Transformer模型与传统CNN模型进行对比实验,验证模型效果.实验结果表明:使用Vision Transformer模型在4 种裤装廓形分类上的分类准确率达到97.72%,与ResNet-50 和MobileNetV2 模型相比均有提升,可为服装廓形的图像分类识别提供有力支撑,在实际服装领域中有较高的使用价值.

外文摘要：To improve users'satisfaction in purchasing clothes on e-commerce platforms,shopping websites currently use manual input text or search pictures to help consumers find clothes.However,it is quite difficult for consumers to find satisfactory clothing in a short time when the information of clothing is not detailed.To quickly find clothing that meets the needs of consumers,shopping websites shall classify clothes according to user'preferences and styles,and label each item of clothes.However,for huge garment image data,manual annotation classification methods require a lot of time and labour and are susceptible to human subjective factors.Therefore,it is necessary to study the method of quick picture classification using computer networks.The algorithm based on deep learning can extract useful features from large-scale clothing image data,and then classify them with much higher accuracy and speed than traditional manual labeling and classification methods.This shortens consumers'search time and narrows the search scope,thus providing consumers with more accurate results.To address the problem of inaccurate classification of pant silhouette recognition and classification models,the Vision Transformer model with self-attention mechanism was adopted to realize the classification of pant silhouette images.Firstly,by adding self-attention mechanism and enhancing useful feature,the interference problem of image background and other irrelevant information on profile recognition was solved.Secondly,the transfer learning method was used to train and verify the profiles of four types of pants,including wide-leg pants,bell-bottom pants,tights and Harem pants.Finally,the improved Vision Transformer model was compared with the traditional CNN model to verify the model effect.In this paper,the CNN structure was introduced into the input layer of the Vision Transformer model,Dropout was replaced by DropPath method at the coding layer,and the output layer was set to a fully connected layer.Then,transfer learning was used to train the improved Vision Transformer model on the enhanced data set of this study.Experimental results show that the method of data enhancement can effectively improve model generalization ability.The average recognition rate of the test set is increased from 93.99%before data enhancement to 95.48%after data enhancement.After comparison of four models,the recognition accuracy rate of the improved Vision Transformer model in the validation set reaches 97.72%,being 10%higher than the MobileNetV2 model.The model in this paper has a good ability to identify the silhouette of trousers,and has a high application value in the classification and recognition of clothing styles and silhouettes.

外文关键词：

pant silhouetteself-attentionVision Transformertransfer learningimage classificationcontour recognition

作者：

应欣、张宁、申思

展开 >

作者单位：

江西服装学院服装设计学院,南昌 330201

江西服装学院南昌市数字化重点实验室,南昌 330201

关键词：

裤装廓形自注意力机制 Vision transformer 迁移学习图像分类廓形识别

基金：

江西省教育厅科学技术研究项目

项目编号：

GJJ2202819

出版年：

2024

DOI：

10.3969/j.issn.1001-7003.2024.11.009

丝绸

浙江理工大学中国丝绸协会中国纺织信息中心

丝绸

CSTPCD北大核心

影响因子：0.567

ISSN：1001-7003

年,卷(期)：2024.61(11)