Alignment
The work of turning a next-token predictor into a useful assistant: SFT (supervised fine-tuning) on human-written examples, then RLHF or DPO on human preference data.
Continue
The work of turning a next-token predictor into a useful assistant: SFT (supervised fine-tuning) on human-written examples, then RLHF or DPO on human preference data.
Continue