Skip to content
The loss curve

About

The loss curve is a code-first course on building a language model. Every chapter gives you runnable code in the browser or on your machine, then shows what that code produced through a visualization tied to the same data the next chapter will pick up. By the end you have a working transformer training on your own machine.

It is the constructive counterpart to Step by Token. That site explains how an LLM works; this one shows how you build one. The two are designed to be read together, but neither requires the other.

Methodology

The book's organizing principle is artifact-first, code-first. Every chapter starts with runnable code: read the function, run it, look at what came out, then read the prose that explains the important pieces.

The chapters build cumulatively. The bigram model from chapter 1 is smoothed in chapter 2, gets a learned tokenizer in chapter 3, gets dense embeddings in chapter 4, gets attention in chapter 8, becomes a transformer block in chapter 10, then moves into local Python training and inference.

The pedagogical guarantee is nothing is a black box. Every line of code the reader meets is something they can open. The production implementation in lib/ml/ is available for cross-reference and is itself short, tested, and shaped to match the chapter.

Inspirations

The presentation owes a lot to The Nature of Code by Daniel Shiffman: manipulable sketches as pedagogical units, an open conversational voice, the patience to make a single concept interesting before moving on.

The architecture is informed by Andrej Karpathy's nanoGPT, Distill, Jay Alammar, and the public work that made scaling laws and mechanistic analysis legible outside frontier labs.

Credits

Built with Next.js, React, Tailwind CSS, MDX, KaTeX, Shiki, D3, and Radix UI.

Typography: Source Serif 4, Inter, JetBrains Mono, served via next/font.

The reference dataset for chapters 11-15 is TinyShakespeare (public domain). Tokenization in those chapters uses tiktoken with the GPT-2 vocabulary.

v0.1 · 21 chapters · stage 6