[102] Attention Augmented Convolutional Networks

February 16, 2023 ยท 1 min ยท long8v ยท 

[49] Sparse Graph Attention Networks

August 10, 2022 ยท 3 min ยท long8v ยท 

[22] Transformers without Tears: Improving the Normalization of Self-Attention

April 21, 2022 ยท 3 min ยท long8v ยท 

[12] BBPE: Neural Machine Translation with Byte-Level Subwords

February 18, 2022 ยท 3 min ยท long8v ยท