– Stability AI has launched Stable Diffusion XL Turbo, an AI image-synthesis model that can generate imagery based on a written prompt in real-time.
– The model uses a technique called Adversarial Diffusion Distillation (ADD), which reduces the number of steps required to produce image outputs.
– SDXL Turbo is similar to Generative Adversarial Networks (GANs) in producing single-step image outputs.
– The speed of SDXL Turbo allows for rapid image generation, with a 3-step 1024×1024 image being generated in about 4 seconds.
– SDXL Turbo has the potential for real-time generative AI video filters or experimental video game graphics generation.
– The model is currently available under a non-commercial research license but Stability AI is open to commercial applications.
– Stability AI has faced internal management issues and is exploring a potential company sale, but continues to release new products and updates.
– There are beta demonstrations and live demos available to try out the capabilities of SDXL Turbo.
On Tuesday, Stability AI launched Stable Diffusion XL Turbo, an AI image-synthesis model that can rapidly generate imagery based on a written prompt. So rapidly, in fact, that the company is billing it as “real-time” image generation, since it can also quickly transform images from a source, such as a webcam, quickly.
SDXL Turbo’s primary innovation lies in its ability to produce image outputs in a single step, a significant reduction from the 20–50 steps required by its predecessor. Stability attributes this leap in efficiency to a technique it calls Adversarial Diffusion Distillation (ADD). ADD uses score distillation, where the model learns from existing image-synthesis models, and adversarial loss, which enhances the model’s ability to differentiate between real and generated images, improving the realism of the output.
Stability detailed the model’s inner workings in a research paper released Tuesday that focuses on the ADD technique. One of the claimed advantages of SDXL Turbo is its similarity to Generative Adversarial Networks (GANs), especially in producing single-step image outputs.
SDXL Turbo images aren’t as detailed as SDXL images produced at higher step counts, so it’s not considered a replacement of the previous model. But for the speed savings involved, the results are eye-popping.
To try it out, we ran SDXL Turbo locally on an Nvidia RTX 3060 using Automatic1111 (the weights drop in just like SDXL weights), and it can generate a 3-step 1024×1024 image in about 4 seconds, versus 26.4 seconds for a 20-step SDXL image with similar detail. Smaller images generate much faster (under one second for 512×768), and of course, a beefier graphics card such as an RTX 3090 or 4090 will allow much quicker generation times as well. Contrary to Stability’s marketing, we’ve found that SDXL Turbo images have the best detail at around 3–5 steps per image.
SDXL Turbo’s generation speed is where the “real-time” claim comes in. Stability AI says that on an Nvidia A100 (a powerful AI-tuned GPU), the model can generate a 512×512 image in 207 ms, including encoding, a single de-noising step, and decoding. Speeds like that could lead to real-time generative AI video filters or experimental video game graphics generation, if coherency issues can be solved. In this context, coherency means maintaining the same subject between multiple frames or generations.
Currently, SDXL Turbo is available under a non-commercial research license, limiting its use to personal, non-commercial purposes. This move has already been met with some criticism in the Stable Diffusion community, but Stability AI has expressed openness to commercial applications and invites interested parties to get in touch for more information.
Meanwhile, Stability AI itself has faced internal management issues, with an investor recently urging CEO Emad Mostaque to resign. Stability management has reportedly been exploring a potential company sale to a larger entity, but that hasn’t slowed down Stability’s cadence of releases. Just last week, the firm announced Stable Video Diffusion, which can turn still images into short video clips.
Stability AI offers a beta demonstration of SDXL Turbo’s capabilities on its image-editing platform, Clipdrop. You can also experiment with an unofficial live demo on Hugging Face for free. Obviously all the usual caveats apply, including the lack of provenance for training data and the potential for misuse. Even with those unresolved issues, technological progress in AI image synthesis is certainly not slowing down.
AI Eclipse TLDR:
Stability AI has launched Stable Diffusion XL Turbo, an AI image-synthesis model that can rapidly generate images based on a written prompt. The model is capable of producing image outputs in a single step, a significant improvement from its predecessor that required 20-50 steps. This efficiency is achieved through a technique called Adversarial Diffusion Distillation (ADD), which uses score distillation and adversarial loss to enhance the model’s ability to differentiate between real and generated images. While the images produced by SDXL Turbo are not as detailed as those generated by the previous model, the speed savings are impressive. For example, it can generate a 3-step 1024×1024 image in about 4 seconds, compared to 26.4 seconds for a 20-step image with similar detail. Stability AI claims that on a powerful AI-tuned GPU, the model can generate a 512×512 image in 207 ms, including encoding, de-noising, and decoding. SDXL Turbo is currently available for non-commercial research purposes, but commercial applications are being considered. Stability AI has also faced internal management issues, but continues to release new products, including Stable Video Diffusion, which can turn still images into short video clips. A beta demonstration of SDXL Turbo’s capabilities is available on Stability AI’s image-editing platform, Clipdrop.