FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation
Abstract
Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images. However, these generated images often exhibit an artistic or surreal quality that diverges from the authenticity of real-world food representations. This inadequacy renders them impractical for applications requiring realistic food imagery, such as training models for image-based dietary assessment. To address these limitations, we introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions. The development of the FoodFusion model involves harnessing an extensive array of open-source food datasets, resulting in over 300,000 curated image-caption pairs. Additionally, we propose and employ two distinct data cleaning methodologies to ensure that the resulting image-text pairs maintain both realism and accuracy. The FoodFusion model, thus trained, demonstrates a remarkable ability to generate food images that exhibit a significant improvement in terms of both realism and diversity over the publicly available image generation models. We openly share the dataset and fine-tuned models to support advancements in this critical field of food image synthesis at https://bit.ly/genai4good.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2023
- DOI:
- 10.48550/arXiv.2312.03540
- arXiv:
- arXiv:2312.03540
- Bibcode:
- 2023arXiv231203540M
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition