扩散模型变革了文本-图像生成领域,使终端用户可以基于简单的自然语言提示生成高质量、多样化的图像艺术作品.然而,由于训练数据集庞大且未经过滤,文本-图像生成模型具有生成色情内容与暴力内容等不适当内容的能力.为更加安全地部署此类模型,提出了一种基于CLIP(contrastive language-image pre-training)方向性损失的微调(directional CLIP loss based fine-tuning,CLIF)算法,使用方向性的CLIP损失来微调模型,以抑制其生成不适当内容的能力.CLIF消耗的计算资源很少,并且具有强制生效的特点.为评估其抑制效果,提出了CTP(categorized toxic prompts)用于评估文本-图像生成模型的不适当内容生成能力.在CTP与COCO(common objects in context)上的实验结果表明,CLIF能够在抑制文本-图像扩散模型生成不安全内容的同时不影响其一般性生成能力.
Purging diffusion models through CLIP based fine-tuning
Diffusion models have revolutionized text-to-image synthesis,enabling users to generate high-quality and imaginative artworks from simple natural-language text prompts.Unfortunately,due to the large and unfiltered training dataset,inappropriate content such as nudity and violence can be generated from them.To deploy such models at a higher level of safety,we propose a novel method,directional contrastive language-image pre-training(CLIP)loss-based fine-tuning,dubbed as CLIF.This method utilizes directional CLIP loss to suppress the model's inappropriate generation ability.CLIF is lightweight and immune to circumvention.To demonstrate the effectiveness of CLIF,we proposed a benchmark called categorized toxic prompts(CTP)to evaluate the ability to generate inappropriate content for text-to-image diffusion models.As shown by our experiments on CTP and common objects in context(COCO)datasets,CLIF is capable of significantly suppressing inappropriate generation while preserving the model's ability to produce general content.