[71] Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

October 17, 2022 ยท 1 min ยท long8v ยท