
idea : Let’s create a multi-gate MoE (MMOE) that can model multi-tasks without having to explicitly specify the relation of each task.

In typical multi-task learning, we have a shared network (shared bottom) and build FCNs for each task on top of it. In this paper, we combine the idea of MoE and use each expert as a shared bottom. In the original MoE, there is a single gating network, but in MMoE, we create a gating network for each task k.

Each gating network is a classifier whose simple input_dim is a feature and output_dim is num_experts.

The evaluation of the synthetic data is as follows The higher the task-specific correlation, the more likely it is that the

The evaluation of real data is as follows

One-line comment: Hmm…is there any way to give some initial values for correlation per classifier?