
TL;DR
- I read this because.. : I don’t know the difference between ResNet50 and 101 ^^.
- task : image classification, object detection
- Problem :** When you have a network with low layers and a deep network with only identity mapping added to it, the training error of the deep network is higher, even though they are virtually the same network. In other words, the deeper the network, the more unstable the training is to find the optimal solution.
- idea : residual connection. let’s do f(x) + x. This will act like an identity mapping, with f(x)=0 if we don’t need a deep layer.
- architecture: Following the principles of VGG, we 1) equalized the number of filters in every layer 2) doubled the number of filters when the feature map size was halved, but stacked them deeper instead of smaller than VGG, so the number of parameters and FLOPS is lower than VGG.
- objective : CE loss for classification, object detection loss
- baseline : VGG-16, GoogLeNet, plain (ResNet minus residual connection)
- data : CIFAR-10, COCO 2015m ILSVRC 2015
- evaluation : accuracy, mAP, # params, FLOPS
- result : 28% performance improvement in sota. object detection in image classification
- contribution : residual connection
Details
Motivation

A phenomenon called degradation. The deeper it is, the higher the training error, i.e., it’s not overfitting, it’s just not learning well.
Residual learning

The residualizing blocks must be at least two (one is just a linear effect) and of the same dimension.
Network architecture

Network variants

Your questions answered ^^ 101 layers stacked on top of each other
training error on ImageNet

Other
Early papers are fun to read
- Neural Networks: Tricks of the Trade
- Understanding the difficulty of training deep feedforward neural networks https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf