MoE | 🍎 Paper Today I Read 🦔

[60] Efficient Sparsely Activated Transformers

MoE 2022Q3 25min AutoML

[54] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

LM MoE 2022Q3 25min

MoEBERT code reading

[26] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

Sparse MoE code reading