Fashion Clothing Pattern Generation Based on Improved Stable Diffusion
Dress pattern is a window for people to show their personality and fashion.In recent years,with the continuous devel-opment of multimodal technology,text-based dress pattern generation has been well studied.However,the existing methods have not been well applied due to the problems of combining poor semanticity and low resolution.After the large-scale language-image pre-training model CLIP was proposed,various pre-training diffusion models combined with CLIP for text-image genera-tion tasks have become the mainstream methods in this field.However,the original pre-training models have poor generalization ability to the downstream task,relying solely on the pre-training model does not allow flexible and accurate control of the color and structure of the dress pattern,and its large number of parameters is difficult to re-train from scratch.To solve the above prob-lems,this study designs a Stable Diffusion-improved network FT-SDM-L(Fine Tuning-Stable Diffusion Model-Lion),which uses the dress image text dataset to update the weights of the cross-attention module in the original model.The experimental re-sults show that the ClipScore and HPS v2 scores of the fine-tuned model are improved by 0.08 and 1.22 on average,which vali-dates the important ability of this module in combining textual information.Subsequently,to further enhance the model's feature extraction and data mapping capabilities in the apparel domain,a lightweight adapter,Stable-Adapter,was designed to be added to the module's output location to maximize the sensing of changes in the input cues.By adding only 0.75%extra param-eters to the adapter,the ClipScore and HPS v2 scores of the model can be further improved by 0.05,0.38.Good results are achieved in terms of fidelity and semantic consistency of clothing pattern generation
text image generationdiffusion modelcross-attention mechanismimage generationcomputer vision