Skip to content
The loss curve

An interactive course

Build an LLM from scratch, one runnable layer at a time.

Twenty-one chapters, each a runnable code lab. You inspect the tokenizer, the attention head, the training loop, the optimizer. You finish with a small transformer running on your own machine, load real GPT-2 weights into the same code, and ship one specialized assistant for a narrow domain.

You will not build ChatGPT. You will build a small, honest, inspectable GPT-like model where you understand every line — and that is the whole point.

The constructive counterpart to Step by Token: where Step by Token explains how an LLM works, this site shows how you build one.

What you'll build

  • your own tokenizer (BPE) trained on your data
  • a Transformer block written by you in ~150 lines of PyTorch
  • a small GPT trained from scratch on your laptop
  • the same architecture loaded with real GPT-2 124M weights
  • an instruction-tuned chatbot you can talk to
  • all of it in a single my-llm/ project you own

Who this is for

  • you can read JavaScript or Python (or are happy to learn as you go)
  • you've used an LLM and want to understand what's actually inside it
  • you prefer running code to reading equations
  • you want a small model you understand, not a benchmark winner

What's inside

  1. Part 0Before you start

    ch 0

    Python, venv, PyTorch — the local toolchain. Skip if you already have a 3.11+ Python and pip install torch is muscle memory.

  2. Part IStart the project

    ch 1-4

    Tokens, bigrams, BPE, embeddings. You start the local project and build the first pieces of a language model.

  3. Part IIMake it learn

    ch 5-7

    Single neuron, MLP, optimizers. The model stops counting and starts improving through gradients.

  4. Part IIIBuild the transformer

    ch 8-10

    Attention, multiple heads, residual connections, and the complete transformer block used by modern LLMs.

  5. Part IVTrain and use the LLM

    ch 11-16

    Prepare data, switch to PyTorch, train a small GPT, load real GPT-2 weights into the same code, sample, and read its failure modes honestly.

  6. Part VMake it useful, cheaper, and usable

    ch 17-21

    Instruction-tuning, LoRA, quantization, a chat loop, and a capstone where you ship one specialized assistant end-to-end.

  7. Part VIAppendices

    optional

    Optional deep dives that complement the main path: math derivations and second-look explanations of concepts the chapters use without unpacking.

What you need

  • Comfort reading JavaScript or Python. Almost every chapter ships an interactive JS cell in the browser; chapters 11–21 also run Python locally.
  • Python 3.11+ on your machine (3.13 recommended). Local training, fine-tuning, and inference live in my-llm/. PyTorch on CPU is enough; MPS / CUDA help. The on-ramp is chapter 0 — Before you start.
  • ~13 hours of focus, ~2 GB of disk. Each chapter is 7–18 minutes of reading plus run-time. PyTorch (~200 MB), training checkpoints, and the GPT-2 weight download (~500 MB) fit under 2 GB.

Frequently asked questions

What is The Loss Curve?

An interactive, code-first course on building a GPT-style language model from scratch. 21 chapters take you from a bigram counter to a working chatbot you can talk to.

Who is this course for?

Developers, technical learners, and indie hackers who want to understand LLMs by building one. You should be comfortable reading code (JavaScript or Python) — no ML background required.

Do I need a GPU?

No. Every chapter runs on a normal laptop. The training chapter uses a small model and a small dataset so it finishes on CPU; GPU is faster but not required.

What languages does the course use?

JavaScript for the interactive browser cells, Python and PyTorch for the local project. You don't need to know both — pick the one that matches the chapter you're on.

Is it free?

Yes. All chapters are free to read. The local project is yours to keep, modify, and ship.

How is this different from a video course?

Every concept is a piece of code you can run, modify, and inspect. Nothing is a black box. The course pairs short explanations with runnable artifacts you save into your own project.

Do I need to know Python?

Basic Python is enough. Chapter 0 covers setup; chapters 11+ assume you can read PyTorch — we explain it step by step.

What model will I have at the end?

A small, GPT-2-architecture model you trained yourself, plus the same architecture loaded with real GPT-2 124M weights, plus a fine-tuned chat version of either.

Ready to start?

Open chapter 1 — it takes 15 minutes.