⚡

Does a Machine Think?

Watch thousands of cores compute simultaneously in real time.

A Mathematical Inquiry into AI · Part VII

Does a Machine
Think?

If "thinking" is nothing but matrix multiplication — then what is thought?

Let us begin with the most uncomfortable fact. Everything AI does that looks like "thought" — writing poetry, proving theorems, appearing to empathize with your feelings — is, at its physical core, matrix multiplication. Thousands of chips performing multiplications and additions simultaneously. That is all. If this disturbs you — that is precisely where this exploration begins.

00 — AN UNCOMFORTABLE STARTING POINT

Does Thought Happen Step by Step, or All at Once?

When you solve a math problem, you think step by step. But the moment you see a landscape — billions of neurons fire simultaneously. A CPU works like the first. A GPU works like the second. And for AI's "thinking," the second wins by an overwhelming margin.

One genius vs. thousands of workers — hear the difference

CPU (4 cores, sequential) vs GPU (hundreds of cores, parallel)

CPU progress

GPU progress

GPU speedup

—

Here is the uncomfortable question. — When AI writes a poem, what happens inside is a 4096×4096 matrix multiplication. 68 billion multiply-adds. This is the physical substance of "creativity." CPU one at a time: seconds. GPU in parallel: milliseconds. This speed difference is what separates "AI that converses in real time" from "AI that takes 10 minutes per sentence."

01 — THE IDENTITY OF EVERYTHING

Matrix Multiplication: The Physical Identity of Thought

The attention from Part II, the GANs of Part IV, the speech synthesis of Part V, the image recognition of Part VI — the operation underlying everything we have explored in this series is one thing. Matrix multiplication. C = A × B. This is the atom of machine thought.

A tone sounds as each element is computed

Matrix multiplication — row × column dot product

A size

4×3

B size

3×4

Operations

0 / 48

C[i,j] = Σ_k A[i,k] \cdot B[k,j] — Each element is independent \to perfectly parallelizable

02 — AN ACCIDENTAL REVOLUTION

GPU Architecture

The great irony of the AI revolution: the chip that made it all possible was built for gaming. Teenagers buying graphics cards for more realistic explosions and shadows — nobody imagined these would reshape the intellectual history of humanity.

CPU vs GPU — core count and structure

NVIDIA GPU Evolution

GPU	Year	CUDA Cores	AI Impact
GTX 580	2010	512	Used to train AlexNet
K80	2014	4,992	First datacenter AI GPU
V100	2017	5,120 + 640 Tensor	Transformer training standard
A100	2020	6,912 + 432 Tensor	GPT-3 training
H100	2022	16,896 + 528 Tensor	GPT-4, Claude training
B200	2024	18,432 + 1,152 Tensor	Next-gen model training
B300 (Vera Rubin)	2025	Undisclosed	5th-gen NVLink, peak performance

03 — THE PRICE OF A THOUGHT

FLOPS: How Much Does One "Thought" Cost?

Human thought feels free. Machine "thought" has an exact price tag. Training GPT-4 required approximately 10²⁵ operations — at an estimated cost of $100 million. Even when you ask an AI "what's the weather today?", billions of matrix multiplications execute. Every thought has a cost in silicon and electricity.

Computation required for AI training (log scale)

GPT-4 training \approx 2 \times 10²⁵ FLOPS \approx 25,000 H100s \times 3 months

04 — A CHIP BORN ONLY TO MULTIPLY

Tensor Cores: Hardware Dedicated to "Thought"

Eventually, humanity built hardware dedicated solely to matrix multiplication. A Tensor Core performs a 4×4 matrix multiply in a single clock cycle. Not general-purpose computing — silicon designed exclusively for AI's "thinking." The moment machine thought became important enough to deserve its own physical organ.

Hear the speed difference between CUDA and Tensor Cores

CUDA cores (one at a time) vs Tensor Cores (4×4 at once)

CUDA cores

Tensor cores

Speedup

—

05 — AN UNCOMFORTABLE LAW

Scaling Laws: Can You Buy Intelligence?

In 2020, OpenAI discovered something disconcerting: AI "intelligence" follows a power law of invested compute. Double the GPUs, double the electricity, double the money → predictable performance gain. This means intelligence is engineerable and purchasable. And this is the mathematical justification for a multi-trillion-dollar GPU arms race.

L(C) \propto C^(-α) — Loss decreases as a power law of compute (α \approx 0.05)

06 — THE SILICON ARMS RACE

Who Can "Think" the Most?

Company	Primary chips	Est. GPU count	Flagship model
Meta	H100 + + custom MTIA	~600,000 H100-equiv	Llama 4
Google	TPU v5e/v6 (Trillium)	~millions of TPU chips	Gemini 3
Microsoft/OpenAI	H100/H200 + Azure	~500,000+	GPT-5.4
Anthropic	H100 (AWS/GCP)	Undisclosed	Claude Opus 4.6
xAI	H100 (Memphis cluster)	~200,000	Grok 3

One H100 costs ~$30,000–40,000. GPT-4 training cost: ~$100M estimated. — AI's "thinking" is not free. Tens of thousands of GPUs consume power for months, multiplying matrices over and over. Electricity alone costs millions. This is why AI has become a game for a handful of giants.

07 — THE CONNECTION

The Human Brain vs. GPUs

	Human Brain	GPU Cluster
🧠	Units — 86 billion neurons	Units — Thousands of CUDA/Tensor cores
⚡	Speed — ~100 Hz (slow but massively parallel)	Speed — ~2 GHz (fast and massively parallel)
🔌	Power — ~20W (astonishing efficiency)	Power — ~700W/GPU × tens of thousands
💾	Memory — ~2.5 PB (estimated)	Memory — 80GB/GPU × tens of thousands
📐	Core op — Synaptic transmission, plasticity	Core op — Matrix multiply, backpropagation
🎓	Learning — Experience-based, years	Learning — Data-based, weeks to months

So — Does a Machine
Think?

In this series we have explored a machine's dreams, understanding, memory,
imagination, voice, and vision.
And now the physical truth beneath all of it is revealed —
matrix multiplication.

This means two things.
Machine thought is nothing more than multiplication.
And multiplication alone can do all of this.

Which of these facts is more astonishing
nobody yet knows.

← Part VI: See

edu.kimsh.kr