首页|基于稳定扩散模型的汉服效果图生成研究

基于稳定扩散模型的汉服效果图生成研究

扫码查看
针对汉服效果图生成过程中因各朝代服饰特征难以被准确捕捉而造成生成图像朝代混淆的问题,本文基于稳定扩散模型(Stable Diffusion),根据新输入的文本提示词匹配文本与图像特征空间向量,将V*作为新标记符号嵌入层,并协同交叉注意力层参数Wk和Wv进行联合优化,最终搜索模型再学习新服饰文本特征后的损失函数最小值.通过查阅文献史料,收集整理并新增了唐、宋、明3个朝代163个服饰文本提示词.观察生成的汉服效果图,该模型能根据文本提示词生成符合朝代特征的服饰图像,较未融合汉服模型特征的3种常用文本生成图像算法,其生成的图像更为清晰且高质.在消融实验中,该模型采用特定ID优化标记符号V*,与其他方式相比,具有较高的图像对齐度和较低的文本对齐度.在唐、宋、明3个朝代的实验中,KID值和MMD值的均值都相对较低,表明本模型在优化汉服效果图生成方面具有一定的可行性和有效性.
Research on Generating Hanfu Effect Drawing Based on a Stable Diffusion Model
Aiming at the problem of confusion of dynasties in the image generation of Hanfu renderings due to the difficulty in accurately capturing the costume features of each dynasty,based on the Stable Diffusion model,the text and image feature space vectors are matched according to the newly input text prompt words,V*is used as the new marker symbol embedding layer,and the cross-attention layer parameters Wk and Wv are jointly optimized,ulti-mately minimizing the loss function of the model after learning new clothing text features.Through consulting the lit-erature and historical materials,163 text prompts related to clothing from the Tang,Song,and Ming dynasties were collected and organized.The generated Hanfu effect images demonstrate that the model can create garment images that correspond to the specific characteristics of each dynasty based on the text prompts words.Compared to three commonly used text-to-image generation algorithms that do not integrate Hanfu model features,the images generated by this method are clearer and of higher quality.In ablation experiments,the model employs the specific ID optimi-zation tagging symbol V*,which shows higher image alignment and lower text alignment compared to other meth-ods.In the experiments of Tang,Song and Ming dynasties,the mean values of KID and MMD are relatively low,which indicates that the proposed model has certain feasibility and effectiveness in optimizing the generation of Han-fu renderings.

dress effect drawingHanfuimage generationstable diffusion modeltext-to-image generation

李智、陈郁

展开 >

上海工程技术大学纺织服装学院,上海 201620

服饰效果图 汉服 图像生成 稳定扩散模型 文本生成图像

2024

北京服装学院学报(自然科学版)
北京服装学院

北京服装学院学报(自然科学版)

影响因子:0.17
ISSN:1001-0564
年,卷(期):2024.44(4)