Skip to content
The loss curve

Residual connection

output = input + sublayer(input). Lets gradients flow cleanly through deep stacks; was the key innovation that made deep CNNs practical, now in every transformer.