Transformer | 🍎 Paper Today I Read 🦔

[110] Understanding the Role of Self Attention for Efficient Speech Recognition

2022Q1 ICLR 25min transformer

[94] Recipe for a General, Powerful, Scalable Graph Transformer

long NeurIPS graph 25min transformer

[89] Relational Attention: Generalizing Transformers for Graph-Structured Tasks

microsoft graph 2022Q4 transformer

[71] Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

25min sparse 2022Q4 transformer