[54] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language ModelsLM MoE 2022Q3 25min
[26] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts2018 MoE KDD