[22] Transformers without Tears: Improving the Normalization of Self-Attention

April 21, 2022 ยท 3 min ยท long8v ยท