首页|基于CLIP微调的扩散模型安全化

基于CLIP微调的扩散模型安全化

扫码查看
扩散模型变革了文本-图像生成领域,使终端用户可以基于简单的自然语言提示生成高质量、多样化的图像艺术作品.然而,由于训练数据集庞大且未经过滤,文本-图像生成模型具有生成色情内容与暴力内容等不适当内容的能力.为更加安全地部署此类模型,提出了一种基于CLIP(contrastive language-image pre-training)方向性损失的微调(directional CLIP loss based fine-tuning,CLIF)算法,使用方向性的CLIP损失来微调模型,以抑制其生成不适当内容的能力.CLIF消耗的计算资源很少,并且具有强制生效的特点.为评估其抑制效果,提出了CTP(categorized toxic prompts)用于评估文本-图像生成模型的不适当内容生成能力.在CTP与COCO(common objects in context)上的实验结果表明,CLIF能够在抑制文本-图像扩散模型生成不安全内容的同时不影响其一般性生成能力.
Purging diffusion models through CLIP based fine-tuning
Diffusion models have revolutionized text-to-image synthesis,enabling users to generate high-quality and imaginative artworks from simple natural-language text prompts.Unfortunately,due to the large and unfiltered training dataset,inappropriate content such as nudity and violence can be generated from them.To deploy such models at a higher level of safety,we propose a novel method,directional contrastive language-image pre-training(CLIP)loss-based fine-tuning,dubbed as CLIF.This method utilizes directional CLIP loss to suppress the model's inappropriate generation ability.CLIF is lightweight and immune to circumvention.To demonstrate the effectiveness of CLIF,we proposed a benchmark called categorized toxic prompts(CTP)to evaluate the ability to generate inappropriate content for text-to-image diffusion models.As shown by our experiments on CTP and common objects in context(COCO)datasets,CLIF is capable of significantly suppressing inappropriate generation while preserving the model's ability to produce general content.

text-to-image generative modelssecuritydatasetsdiffusion models

吴平、林欣

展开 >

华东师范大学 计算机科学与技术学院,上海 200062

文本-图像生成模型 安全性 数据集 扩散模型

2025

华东师范大学学报(自然科学版)
华东师范大学

华东师范大学学报(自然科学版)

北大核心
影响因子:0.55
ISSN:1000-5641
年,卷(期):2025.(1)