image

paper

TL;DR

  • I read this because.. : multi-task learning with uncertainty!
  • task : semantic segmentation, instance segmentation, pixel-wise metric depth
  • Problem :** The previous multitask approach is a weighted sum of losses, and performance is very sensitive to this weighting.
  • Idea :** Assuming a Gaussian for output y and estimating according to MLE, we can get the weight relative to the noise of each task itself by $\sigma$, i.e., optimize the model weight $W$ and the task dependent $\sigma_{task}$ together.
  • architecture : DeepLab V3 (ResNet101 -> Atrous Spatial Pyramid Pooling) + decoder for 3 tasks
  • objective : CE(semantic segmentation), L1(instance segmentation, depth estimation)
  • baseline : task specific model, weighted multi-task model
  • data : CityScapes benchmark, depth image uses pseudo-label with model named SGM
  • evaluation : IoU, Instance Mean Error, Inverse Depth Mean Error
  • result : sota in crab segmentation, depth prediction where 3 tasks were trained. sota in instance segmentation where 2 tasks were trained. sota in depth prediction where 2 tasks were trained.
  • contribution : This is the first time the model has been trained with 3 tasks.
  • limitation/things I can’t understand : Roughly speaking, I added a learnable weight and added a regularization term to make sure it doesn’t jump around, but it’s beautiful to look at because it’s interpreted from the MLE perspective.

Details

motivation

image

Performance is choppy depending on multi-task loss weight

Architecture

image

Homoscedastic uncertainty as task-dependent uncertainty

  • Epistemic uncertainty
  • Uncertainty due to model, uncertainty due to lack of training data
  • Aleatroic uncertainty
  • Uncertainty caused by the data, uncertainty about information that the data cannot represent.
    • Data-dependent, Hetroscedatic
  • Uncertainty determined by input data and model output
    • Task-dependent, Homoscedastic
  • Uncertainty that does not depend on input data

I don’t understand… Anyway, in this paper we will measure the last task-dependent uncertainty.

Multi-task likelihoods

Let the output of the neural network be $f^W(x)$. In a regression problem, we can assume that the output follows a Gaussian image

where $\sigma$ is the noise scalar

For classification problems, take softmax and turn it into a probability distribution image

For multiple-model output, this can be expressed by factorizing. image

According to maximum likelihood estimation, Log likelihood can be written as image

For the log likelihood of the model output following two gaussians, we can write image

This can now be viewed as a minimization problem for $\mathcal{L}(W, \sigma_1, \sigma_2)$ image

In this case, $\sigma_1$, $\sigma_2$ will be the relative weights of losses 1 and 2 respectively, and the last term, $log\sigma_1\sigma_2$, will be the regularization term.

For the classification problem, let’s extend this to softmax scaled by the scalar $\sigma$. image

The log likelihood would then look like this, image

This again looks like learning the joint loss. image

Again, $\sigma_1$, $\sigma_2$ can be seen as the relative weights of the model.

Result

image image