Residual connection
output = input + sublayer(input). Lets gradients flow cleanly through deep stacks; was the key innovation that made deep CNNs practical, now in every transformer.
Continue
output = input + sublayer(input). Lets gradients flow cleanly through deep stacks; was the key innovation that made deep CNNs practical, now in every transformer.
Continue