首页|基于多文本描述的图像生成方法

基于多文本描述的图像生成方法

扫码查看
针对单条文本描述生成的图像质量不高且存在结构错误的问题进行研究,采用多阶段生成对抗网络模型,并提出对不同文本序列进行插值操作,从多条文本描述中提取特征,以丰富给定的文本描述,使生成图像具有更多细节.为了生成与文本更为相关的图像,引入了多文本深度注意多模态相似度模型以得到注意力特征,并与上一层视觉特征联合作为下一层的输入,从而提升生成图像的真实程度和文本描述之间的语义一致性.为了能够让模型学会协调每个位置的细节,引入了自注意力机制,让生成器生成更加符合真实场景的图像.优化后的模型在CUB和MS-COCO数据集上进行验证,生成的图像不仅结构完整,语义一致性更强,视觉上的效果更加丰富多样.
Image synthesis method based on multiple text description
Aiming at the challenges associates with the low quality and structural errors existed in the images gener-ated by a single text description,a multi-stage generative adversarial network model was used to study,and it was pro-posed to interpolate different text sequences to enrich the given text descriptions by extracting features from multiple text descriptions and imparting greater detail to the generated images.In order to enhance the correlation between the generated images and the corresponding text,a multi-captions deep attentional multi-modal similarity model that cap-tured attention features was introduced.These features were subsequently integrated with visual features from the pre-ceding layer,serving as input for the subsequent layer.This integration improved the realism of the generated images and enhanced their semantic consistency with the text descriptions.In addition,a self-attention mechanism to enable the model to effectively coordinate the details at each position was incorporated,resulting in images that were more aligned with real-world scenarios.The optimized model was verified on the CUB and MS-COCO datasets,demon-strating the generation of images with intact structures,stronger semantic consistency,and richer visual diversity.

text-to-imagegenerative adversarial networkcomputer visionsemantic consistencyself-attention

聂开琴、倪郑威

展开 >

浙江工商大学信息与电子工程学院,浙江 杭州 310018

文本生成图像 生成对抗网络 计算机视觉 语义一致性 自注意力

浙江省自然科学基金

LQ22F010008

2024

电信科学
中国通信学会 人民邮电出版社

电信科学

CSTPCD北大核心
影响因子:0.902
ISSN:1000-0801
年,卷(期):2024.40(5)