Diffusion Models in AI – Everything You Need to Know

A collage of human faces created using AI image generator

In the AI ecosystem, diffusion models are setting up the direction and pace of technological advancement. They are revolutionizing the way we approach complex generative AI tasks. These models are based on the mathematics of gaussian principles, variance, differential equations, and generative sequences. (We’ll explain the technical jargon below)

Modern AI-centric products and solutions developed by Nvidia, Google, Adobe, and OpenAI have put diffusion models at the center of the limelight. DALL.E 2, Stable Diffusion, and Midjourney are prominent examples of diffusion models that are making rounds on the internet recently. Users provide a simple text prompt as input, and these models can convert them into realistic images, such as the one shown below.

An image generated with Midjourney v5 using input prompt: vibrant California poppies. Source: Midjourney

Let’s explore the fundamental working principles of diffusion models and how they are changing the directions and norms of the world as we see it today.

What Are Diffusion Models?

According to the research publication “Denoising Diffusion Probabilistic Models,” the diffusion models are defined as:

“A diffusion model or probabilistic diffusion model is a parameterized Markov chain trained using variational inference to produce samples matching the data after finite time”

Simply put, diffusion models can generate data similar to the ones they are trained on. If the model trains on images of cats, it can generate similar realistic images of cats.

Now let’s try to break down the technical definition mentioned above. The diffusion models take inspiration from the working principle and mathematical foundation of a probabilistic model that can analyze and predict a system’s behavior that varies with time, such as predicting stock market return or the pandemic’s spread.

The definition states that they are parameterized Markov chains trained with variational inference. Markov chains are mathematical models that define a system that switches between different states over time. The existing state of the system can only determine the probability of transitioning to a specific state. In other words, the current state of a system holds the possible states a system can follow or acquire at any given time.

Training the model using variational inference involves complex calculations for probability distributions. It aims to find the exact parameters of the Markov chain that match the observed (known or actual) data after a specific time. This process minimizes the value of the model’s loss function, which is the difference between the predicted (unknown) and observed (known) state.

Once trained, the model can generate samples matching the observed data. These samples represent possible trajectories or state the system could follow or acquire over time, and each trajectory has a different probability of happening. Hence, the model can predict the system’s future behavior by generating a range of samples and finding their respective probabilities (likelihood of these events to happen).

How to Interpret Diffusion Models in AI?

Diffusion models are deep generative models that work by adding noise (Gaussian noise) to the available training data (also known as the forward diffusion process) and then reversing the process (known as denoising or the reverse diffusion process) to recover the data. The model gradually learns to remove the noise. This learned denoising process generates new, high-quality images from random seeds (random noised images), as shown in the illustration below.

Reverse diffusion process: A noisy image is denoised to recover the original image (or generate its variations) via a trained diffusion model. Source: Denoising Diffusion Probabilistic Models

3 Diffusion Model Categories

There are three fundamental mathematical frameworks that underpin the science behind diffusion models. All three work on the same principles of adding noise and then removing it to generate new samples. Let’s discuss them below.

A diffusion model adds and removes noise from an image. Source: Diffusion Models in Vision: A Survey

1. Denoising Diffusion Probabilistic Models (DDPMs)

As explained above, DDPMs are generative models mainly used to remove noise from visual or audio data. They have shown impressive results on various image and audio denoising tasks. For instance, the filmmaking industry uses modern image and video processing tools to improve production quality.

2. Noise-Conditioned Score-Based Generative Models (SGMs)

SGMs can generate new samples from a given distribution. They work by learning an estimation score function that can estimate the log density of the target distribution. Log density estimation makes assumptions for available data points that its a part of an unknown dataset (test set). This score function can then generate new data points from the distribution.

For instance, deep fakes are notorious for producing fake videos and audios of famous personalities. But they are mostly attributed to Generative Adversarial Networks (GANs). However, SGMs have shown similar capabilities – at times outperform – in generating high-quality celebrity faces. Also, SGMs can help expand healthcare datasets, which are not readily available in large quantities due to strict regulations and industry standards.

3. Stochastic Differential Equations (SDEs)

SDEs describe changes in random processes concerning time. They are widely used in physics and financial markets involving random factors that significantly impact market outcomes.

For instance, the prices of commodities are highly dynamic and impacted by a range of random factors. SDEs calculate financial derivatives like futures contracts (like crude oil contracts). They can model the fluctuations and calculate favorable prices accurately to give a sense of security.

Major Applications of Diffusion Models in AI

Let’s look at some widely adapted practices and uses of diffusion models in AI.

High-Quality Video Generation

Creating high-end videos using deep learning is challenging as it requires high continuity of video frames. This is where diffusion models come in handy as they can generate a subset of video frames to fill in between the missing frames, resulting in high-quality and smooth videos with no latency.

Researchers have developed the Flexible Diffusion Model and Residual Video Diffusion techniques to serve this purpose. These models can also produce realistic videos by seamlessly adding AI-generated frames between the actual frames.

These models can simply extend the FPS (frames per second) of a low FPS video by adding dummy frames after learning the patterns from available frames. With almost no frame loss, these frameworks can further assist deep learning-based models to generate AI-based videos from scratch that look like natural shots from high-end cam setups.

A wide range of remarkable AI video generators is available in 2023 to make video content production and editing quick and straightforward.

Text-to-Image Generation

Text-to-image models use input prompts to generate high-quality images. For instance, giving input “red apple on a plate” and producing a photorealistic image of an apple on a plate. Blended diffusion and unCLIP are two prominent examples of such models that can generate highly relevant and accurate images based on user input.

Also, GLIDE by OpenAI is another widely known solution released in 2021 that produces photorealistic images using user input. Later, OpenAI released DALL.E-2, its most advanced image generation model yet.

Similarly, Google has also developed an image generation model known as Imagen, which uses a large language model to develop a deep textual understanding of the input text and then generates photorealistic images.

We have mentioned other popular image-generation tools like Midjourney and Stable Diffusion (DreamStudio) above. Have a look at an image generated using Stable Diffusion below.

An collage of human faces created with Stable Diffusion 1.5

An image created with Stable Diffusion 1.5 using the following prompt: “collages, hyper-realistic, many variations portrait of very old thom yorke, face variations, singer-songwriter, ( side ) profile, various ages, macro lens, liminal space, by lee bermejo, alphonse mucha and greg rutkowski, greybeard, smooth face, cheekbones”

Diffusion Models in AI – What to Expect in the Future?

Diffusion models have revealed promising potential as a robust approach to generating high-quality samples from complex image and video datasets. By improving human capability to use and manipulate data, diffusion models can potentially revolutionize the world as we see it today. We can expect to see even more applications of diffusion models becoming an integral part of our daily lives.

Having said that, diffusion models are not the only generative AI technique. Researchers also use Generative Adversarial Networks (GANs), Variational Autoencoders, and flow-based deep generative models to generate AI content. Understanding the fundamental characteristics that differentiate diffusion models from other generative models can help produce more effective solutions in the coming days.

To learn more about AI-based technologies, visit Unite.ai. Check out our curated resources on generative AI tools below.

The post Diffusion Models in AI – Everything You Need to Know appeared first on Unite.AI.