[110] Understanding the Role of Self Attention for Efficient Speech Recognition

April 17, 2023 ยท 2 min ยท long8v ยท 

[89] Relational Attention: Generalizing Transformers for Graph-Structured Tasks

December 15, 2022 ยท 2 min ยท long8v ยท 

[71] Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

October 17, 2022 ยท 1 min ยท long8v ยท