Transformer
Neural network architecture built from stacked blocks of (multi-head attention + FFN) with residuals and layer norms. The dominant architecture for language models since 2017.
Continue
Neural network architecture built from stacked blocks of (multi-head attention + FFN) with residuals and layer norms. The dominant architecture for language models since 2017.
Continue