Transformer

Neural network architecture built from stacked blocks of (multi-head attention + FFN) with residuals and layer norms. The dominant architecture for language models since 2017.

Continue

← All terms Browse chapters