[101] Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

paper

TL;DR

I read this because.. : multi-task learning with uncertainty!
task : semantic segmentation, instance segmentation, pixel-wise metric depth
Problem :** The previous multitask approach is a weighted sum of losses, and performance is very sensitive to this weighting.
Idea :** Assuming a Gaussian for output y and estimating according to MLE, we can get the weight relative to the noise of each task itself by $\sigma$, i.e., optimize the model weight $W$ and the task dependent $\sigma_{task}$ together.
architecture : DeepLab V3 (ResNet101 -> Atrous Spatial Pyramid Pooling) + decoder for 3 tasks
objective : CE(semantic segmentation), L1(instance segmentation, depth estimation)
baseline : task specific model, weighted multi-task model
data : CityScapes benchmark, depth image uses pseudo-label with model named SGM
evaluation : IoU, Instance Mean Error, Inverse Depth Mean Error
result : sota in crab segmentation, depth prediction where 3 tasks were trained. sota in instance segmentation where 2 tasks were trained. sota in depth prediction where 2 tasks were trained.
contribution : This is the first time the model has been trained with 3 tasks.
limitation/things I can’t understand : Roughly speaking, I added a learnable weight and added a regularization term to make sure it doesn’t jump around, but it’s beautiful to look at because it’s interpreted from the MLE perspective.

Details

motivation

Performance is choppy depending on multi-task loss weight

Architecture

Homoscedastic uncertainty as task-dependent uncertainty

Epistemic uncertainty
Uncertainty due to model, uncertainty due to lack of training data
Aleatroic uncertainty
Uncertainty caused by the data, uncertainty about information that the data cannot represent.
- Data-dependent, Hetroscedatic
Uncertainty determined by input data and model output
- Task-dependent, Homoscedastic
Uncertainty that does not depend on input data

I don’t understand… Anyway, in this paper we will measure the last task-dependent uncertainty.

Multi-task likelihoods

Let the output of the neural network be $f^W(x)$. In a regression problem, we can assume that the output follows a Gaussian

where $\sigma$ is the noise scalar

For classification problems, take softmax and turn it into a probability distribution

For multiple-model output, this can be expressed by factorizing.

According to maximum likelihood estimation, Log likelihood can be written as

For the log likelihood of the model output following two gaussians, we can write

This can now be viewed as a minimization problem for $\mathcal{L}(W, \sigma_1, \sigma_2)$

In this case, $\sigma_1$, $\sigma_2$ will be the relative weights of losses 1 and 2 respectively, and the last term, $log\sigma_1\sigma_2$, will be the regularization term.

For the classification problem, let’s extend this to softmax scaled by the scalar $\sigma$.

The log likelihood would then look like this,

This again looks like learning the joint loss.

Again, $\sigma_1$, $\sigma_2$ can be seen as the relative weights of the model.

TL;DR#

Details#

motivation#

Architecture#

Homoscedastic uncertainty as task-dependent uncertainty#

Multi-task likelihoods#

Result#