Research on food image generation based on diffusion models
Research on food image generation primarily focuses on generating meal images from a specific set of ingredients,falling under the category of text-to-image tasks.However,due to the complexity associated with dietary images,similar efforts to generate realistic food images have yet to achieve complete success.Existing methods utilize generative adversarial networks(GANs)based on ingredient and cooking information to progressively generate high-quality samples.However,these methods may fail to cover the entire distribution,making it challenging to achieve the goal of conditionally generating high-quality images.Diffu-sion models,a class of likelihood-based models,have recently been demonstrated to generate high-quality images while offering de-sirable properties such as distribution coverage,fixed training objectives,and ease of scalability.This paper explores the utilization of cross-modal information association and guidance of diffusion models to generate high-quality food images based on category in-formation.Results on the Recipe1M dataset demonstrate a significant improvement in model performance compared to baseline methods.