image

paper

Details

Multi-task Learning

Why does it work?

  1. prevent overfitting for one task, 2) aggregate data, 3) learn “inductive bias”, and 4) learn good features.

hard parameter sharing vs soft parameter sharing

  • hard parameter sharing image

Common MTL Model Structure

  • soft parameter sharing image

Stack networks for each task and impose an L2 norm loss so that the parameters of each network don’t vary too much.

Recent work on MTL for deep learning

  • Deep Relationship Networks Impose a matrix prior on FCNs to allow the model to learn the relationship between tasks image

  • Cross-stitch network image

Have separate networks for each task, with the parameters of each network being a linear combination of trainable $\alpha$.

  • Weighting losses with uncertainty image

Measure the uncertainty of each task and add relative weight to the multi-task loss function -> You might also like to read this!

Auxiliary tasks

  • related task Related tasks are better
  • adversarial Learning by doing the opposite of what you want, e.g., predicting the domain of the input in domain adaptation and reversing the gradient in an adversarial task? Ganin, 2015
  • Hint Use a slightly easier task. For example, learn a task that predicts the sentiment of a sentence by dividing it into positive/negative -> connectivity experiment Remind me!
  • Representation learning Making the representation good can be an auxiliary task, since it’s all about making a good representation. For example, language modeling or autoencoders.

Lesson learned

I feel like BERT is really destructive lol