Skip to content
The loss curve

Quantization

Storing weights as low-precision integers (INT8, INT4) instead of floats. Cuts model size 4–8× and speeds up inference, usually with minimal quality loss.