Meta’s AI generator CM3leon scores higher than Google’s Parti (non-finetuned).
CM3leon (pronounced “chameleon”) is Meta’s new AI image generator. It does both, text-to-image and image-to-text generation. It’s also built with the help of a large model and supervised fine-tuning (SFT).
AI image generators are ranked by something called a Fréchet Inception Distance or FID score, where lower is better. CM3Leon scores 4.88 vs. Google’s Parti (7.23), Imagen (7.27), DALL-E 2 (10.39), and Stable Diffusion (12.63). It’s notable that the FID score of Google’s Parti finetuned scores 3.22, which is the best so far[1].
The research paper [PDF] called Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning further outlines the process and notes how autoregressive models can compete with and exceed diffusion-based models.
You can read the official announcement here.
Apart from generating images from text prompts and creating text output to define image input, CM3leon can also separate and identify different parts of images and produce higher-resolution results.
Notably, the text-to-image generation, which is going to be the main use of the model in the the AI pop culture, uses five times less compute power than “previous transform-based methods.” The model is also trained on a remarkably smaller dataset of 3B tokens. To put this in perspective, DALL-E 2 is trained on 3.5B parameters and 1.5B more for increasing the resolution.
Meta has been going all in when it comes to AI. Its models are competing with the top models from bigger tech companies such as Microsoft and Google and with this new notch on their belt, the company seems to be ready to tackle a wholly new market altogether.