[100] An Overview of Multi-Task Learning in Deep Neural Networks

Details

Why does it work?

prevent overfitting for one task, 2) aggregate data, 3) learn “inductive bias”, and 4) learn good features.

Common MTL Model Structure

Stack networks for each task and impose an L2 norm loss so that the parameters of each network don’t vary too much.

Deep Relationship Networks Impose a matrix prior on FCNs to allow the model to learn the relationship between tasks
Cross-stitch network

Have separate networks for each task, with the parameters of each network being a linear combination of trainable $\alpha$.

Measure the uncertainty of each task and add relative weight to the multi-task loss function -> You might also like to read this!

related task Related tasks are better
adversarial Learning by doing the opposite of what you want, e.g., predicting the domain of the input in domain adaptation and reversing the gradient in an adversarial task? Ganin, 2015
Hint Use a slightly easier task. For example, learn a task that predicts the sentiment of a sentence by dividing it into positive/negative -> connectivity experiment Remind me!
Representation learning Making the representation good can be an auxiliary task, since it’s all about making a good representation. For example, language modeling or autoencoders.

I feel like BERT is really destructive lol