Recognition and classification of trouser silhouettes based on Vision Transformer and migration learning
To improve users'satisfaction in purchasing clothes on e-commerce platforms,shopping websites currently use manual input text or search pictures to help consumers find clothes.However,it is quite difficult for consumers to find satisfactory clothing in a short time when the information of clothing is not detailed.To quickly find clothing that meets the needs of consumers,shopping websites shall classify clothes according to user'preferences and styles,and label each item of clothes.However,for huge garment image data,manual annotation classification methods require a lot of time and labour and are susceptible to human subjective factors.Therefore,it is necessary to study the method of quick picture classification using computer networks.The algorithm based on deep learning can extract useful features from large-scale clothing image data,and then classify them with much higher accuracy and speed than traditional manual labeling and classification methods.This shortens consumers'search time and narrows the search scope,thus providing consumers with more accurate results.To address the problem of inaccurate classification of pant silhouette recognition and classification models,the Vision Transformer model with self-attention mechanism was adopted to realize the classification of pant silhouette images.Firstly,by adding self-attention mechanism and enhancing useful feature,the interference problem of image background and other irrelevant information on profile recognition was solved.Secondly,the transfer learning method was used to train and verify the profiles of four types of pants,including wide-leg pants,bell-bottom pants,tights and Harem pants.Finally,the improved Vision Transformer model was compared with the traditional CNN model to verify the model effect.In this paper,the CNN structure was introduced into the input layer of the Vision Transformer model,Dropout was replaced by DropPath method at the coding layer,and the output layer was set to a fully connected layer.Then,transfer learning was used to train the improved Vision Transformer model on the enhanced data set of this study.Experimental results show that the method of data enhancement can effectively improve model generalization ability.The average recognition rate of the test set is increased from 93.99%before data enhancement to 95.48%after data enhancement.After comparison of four models,the recognition accuracy rate of the improved Vision Transformer model in the validation set reaches 97.72%,being 10%higher than the MobileNetV2 model.The model in this paper has a good ability to identify the silhouette of trousers,and has a high application value in the classification and recognition of clothing styles and silhouettes.