Quantization

Storing weights as low-precision integers (INT8, INT4) instead of floats. Cuts model size 4–8× and speeds up inference, usually with minimal quality loss.

Continue

← All terms Browse chapters