Skip to content
The loss curve

Alignment

The work of turning a next-token predictor into a useful assistant: SFT (supervised fine-tuning) on human-written examples, then RLHF or DPO on human preference data.