[110] Understanding the Role of Self Attention for Efficient Speech Recognition

2023λ…„ 4μ›” 17일 Β· 2 λΆ„ Β· long8v Β· 

[94] Recipe for a General, Powerful, Scalable Graph Transformer

2023λ…„ 1μ›” 3일 Β· 1 λΆ„ Β· long8v Β· 

[89] Relational Attention: Generalizing Transformers for Graph-Structured Tasks

2022λ…„ 12μ›” 15일 Β· 2 λΆ„ Β· long8v Β· 

[71] Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

2022λ…„ 10μ›” 17일 Β· 1 λΆ„ Β· long8v Β·