✨

Part II — Sound On

Interactive demos with real-time audio.
Make sure your device isn't muted.

Set volume to a comfortable level

The Mathematics of Diffusion — Part II

Does a Machine
Dream?

Noise, latent space, and the imagination of AI

In 1968, Philip K. Dick asked: "Do androids dream of electric sheep?"
In 2022, a machine answered — by generating images from pure noise, guided by nothing but words.

In Part I, you saw how diffusion destroys information irreversibly. Now discover how a machine learned to run it backward — to dream shapes out of chaos, just as we see faces in clouds.

← Part I

00 — A WARM-UP PUZZLE

Can Chaos Un-Scramble Itself?

In 1967, mathematician Vladimir Arnold discovered something astonishing. Take any image — say, a picture of a cat. Apply this formula to every pixel:

(x, y) → (2x + y, x + y) mod N

The image looks completely destroyed — indistinguishable from noise. But keep applying the same formula, and after exactly 12 steps, the original image magically reappears. Perfectly. Every pixel back in place.

Listen to order dissolve and reform — each step has a tone

Step 0 · Original

Original (reference)

Similarity to Original Over Time

Step

0 / 12

Similarity

100%

Status

Original

⏩ Normal

Why does this matter? — Arnold's cat map is deterministic and invertible. It's a linear map on the torus — no information is ever lost. Every pixel has one and only one destination, and one and only one origin. The system is guaranteed to return (Poincaré recurrence).

Now contrast this with diffusion: when you add random noise to an image, the destruction is stochastic and irreversible. You cannot simply reverse it — the randomness has erased the path back. To undo diffusion, you need a neural network to learn the reverse mapping from data. That's the miracle of Stable Diffusion.

Arnold's cat comes back on its own.
Diffusion's cat needs AI to find its way home.

01 — SEE IT HAPPEN

Watch an Image Emerge From Noise

Below is pure random static. Press Denoise and watch as a hidden image materializes step by step — exactly what Stable Diffusion does millions of times per day.

Each step plays a tone — listen to order emerge from chaos

Step

0 / 40

Noise

100%

Clarity

None

02 — HEAR IT HAPPEN

Hear a Melody Rise From Static

A simple melody is buried under noise. Drag the slider to gradually denoise it. At some point your brain will suddenly "catch" the melody — the magic of denoising.

Use headphones for the full effect

Pure static

🔊100% noise

03 — READ IT HAPPEN

Watch Text Crystallize

Drag the slider to denoise random pixels into a readable word. Just like the sound demo — at some noise level, your brain suddenly recognizes the text. That moment of recognition is the threshold where diffusion models also "see" structure.

🔍100% noise

Clarity

Readable?

04 — ENHANCE

Super-Resolution: Blurry → Sharp

A blurry photo is a partially noised version of a sharp one. Drag the slider to enhance resolution — the model fills in missing detail.

Low Res

→

Enhanced

↑1× blurry

05 — THE SECRET

How Does It Actually Work?

Everything you just experienced uses the same three-step process:

Step 1

🖼️ → 📺

Destroy

Gradually add noise until the original is gone.

→

Step 2

🧠

Learn

Train on millions of examples at every noise level.

→

Step 3

📺 → 🖼️

Create

Start from noise. Remove it step by step.

"Teach a machine how photos become static,
and it learns how to dream photos out of static."
We see faces in clouds. Machines see images in noise.
Both are finding patterns where none were placed — the essence of dreaming.

05b — THE BREAKTHROUGHS

DDPM and Stable Diffusion — What Are They, Really?

DDPM — The Moment It All Clicked (2020)

The idea of reversing diffusion had existed since 2015, but the results were blurry and unconvincing. Then in 2020, Jonathan Ho, Ajay Jain, and Pieter Abbeel at UC Berkeley published a paper called "Denoising Diffusion Probabilistic Models" — DDPM for short — and everything changed.

Their insight was beautifully simple: don't try to predict the clean image directly. Instead, just predict the noise.

Here's what that means. Take a photo. Add a known amount of Gaussian noise to it. Now show the noisy result to a neural network and ask: "What noise was added?" If the network can answer that correctly, you can subtract the predicted noise — and recover a slightly cleaner image. Repeat this 1,000 times, starting from pure static, and a brand-new image appears from nothing.

The training objective turned out to be absurdly simple — just a mean squared error:

Loss = ‖ ε − ε_θ(x_t, t) ‖²

In plain English: measure how far off the network's noise prediction (ε_θ) is from the actual noise (ε) that was added. That's it. This one equation trained a model that produced images rivaling GANs for the first time — and it was far more stable to train.

The catch? DDPM was slow. Generating one image required running the neural network 1,000 times in sequence, each time denoising a little more. A single image could take minutes on a powerful GPU.

Stable Diffusion — Making It Fast and Free (2022)

Stable Diffusion is the name of a specific open-source model created by a team at LMU Munich (Robin Rombach, Andreas Blattmann, and others) in collaboration with Stability AI. It solved DDPM's speed problem with one brilliant idea:

Don't diffuse in pixel space. Diffuse in a compressed space — the machine's unconscious.

A 512×512 image has 786,432 pixel values. Running diffusion on all of them is expensive. So Stable Diffusion first compresses the image into a tiny latent representation — just 64×64×4 = 16,384 numbers — using a pre-trained autoencoder. Then it runs the diffusion process (forward and reverse) entirely in this compressed space. Finally, it decodes the result back into a full image.

🖼️

512×512 image

→ encode →

🧊

64×64 latent
48× smaller

→ diffuse →

✨

64×64 denoised
latent

→ decode →

🎨

512×512 output

This made generation ~50 times faster. But there is something poetic about this latent space: it is a vast mathematical ocean where every concept humanity has ever photographed — every face, landscape, animal, texture — exists as a point. It is, in a very real sense, the machine's collective unconscious. But that's only half the story. Stable Diffusion also added a text encoder (CLIP) that understands language. When you type "a cat sitting on a mountain at sunset," CLIP translates those words into a numerical vector that guides the denoising process at every step — like dropping a pebble into the machine's unconscious ocean, creating ripples that steer the noise toward an image matching your words. The prompt is the trigger. The latent space is the unconscious. The denoised image is the dream.

The result: type a sentence, wait a few seconds, get a photorealistic image. And because Stability AI released the model weights openly, anyone in the world could use it, modify it, and build on it — triggering the explosion of AI-generated art you see today.

The Family Tree

How they all fit together:

2015

Diffusion Models (Sohl-Dickstein) — first proof that reversing diffusion works, but blurry results

2020

DDPM (Ho et al.) — "just predict the noise" → first sharp, high-quality images

2021

DDIM, Guided Diffusion — faster sampling, text/class guidance

2022

Stable Diffusion (Rombach et al.) — latent space + text encoder + open-source → the revolution

2023–

DALL·E 3, Midjourney v5, SDXL, Sora — same core idea, scaled up to video and beyond

06 — TRY IT YOURSELF

The Stable Diffusion Pipeline — Live

Pick a prompt and watch every stage of the pipeline: your text guides the denoising, noise fills the tiny latent space, the U-Net removes noise step by step, and the decoder reconstructs the full image.

Each denoising step plays a rising tone — hear the image come into focus

Choose a prompt

💬

① CLIP

→

② Latent

→

🧠

③ U-Net ×30

→

📤

④ Decode

Latent Space (8×8) — 48× compressed

Decoded Output (32×32)

Stage

Ready

Denoise Step

0 / 30

Prompt

"a cat"

07 — THE MATHEMATICS

Forward and Reverse

Now see the math behind it. Dissolve a signal into noise, then watch 300 particles converge from chaos into structure.

Forward: Signal → Noise

Signal intact

Reverse: Noise → Structure

A rising tone accompanies the emergence of order

Reverse diffusion

Step

0 / 200

Noise

High

Converged

08 — THE REVOLUTION

What This Mathematics Creates

🎨

Text-to-Image

Type a description, get a photorealistic image.

Stable Diffusion · DALL·E · Midjourney

🎬

Video Generation

Generate coherent video clips from text.

Sora · Runway · Kling

✏️

Image Editing

Noise one region, re-denoise with a new prompt.

"Remove this" · "Change the sky"

🔬

Super-Resolution

Treat blur as noise, fill in missing detail.

Old photos · Medical · Satellites

🧬

Drug Discovery

Generate 3D molecular structures from noise.

Proteins · Materials science

🎵

Audio & Music

Denoise spectrograms to create sounds.

Music · Voice synthesis

09 — THE CONNECTION

This Is Not a Metaphor

	Physics	Stable Diffusion
🌡️	Heat spreads until uniform	Noise added until pure static
📐	Gaussian bell curve	Noise is exactly Gaussian
🎲	Brownian motion	Random noise injection
⏪	Diffusion is irreversible	Neural net learns to reverse it
⚖️	Equilibrium = no info	Pure noise = no image

200 Years From Heat to Pixels

1822

Fourier

Heat equation

1905

Einstein

Random walks

1982

Anderson

Reverse-time SDE

2020

Ho et al.

DDPM

2022

Stable Diffusion

Open-source