[91] Deep Residual Learning for Image Recognition

TL;DR

I read this because.. : I don’t know the difference between ResNet50 and 101 ^^.
task : image classification, object detection
Problem :** When you have a network with low layers and a deep network with only identity mapping added to it, the training error of the deep network is higher, even though they are virtually the same network. In other words, the deeper the network, the more unstable the training is to find the optimal solution.
idea : residual connection. let’s do f(x) + x. This will act like an identity mapping, with f(x)=0 if we don’t need a deep layer.
architecture: Following the principles of VGG, we 1) equalized the number of filters in every layer 2) doubled the number of filters when the feature map size was halved, but stacked them deeper instead of smaller than VGG, so the number of parameters and FLOPS is lower than VGG.
objective : CE loss for classification, object detection loss
baseline : VGG-16, GoogLeNet, plain (ResNet minus residual connection)
data : CIFAR-10, COCO 2015m ILSVRC 2015
evaluation : accuracy, mAP, # params, FLOPS
result : 28% performance improvement in sota. object detection in image classification
contribution : residual connection

A phenomenon called degradation. The deeper it is, the higher the training error, i.e., it’s not overfitting, it’s just not learning well.

The residualizing blocks must be at least two (one is just a linear effect) and of the same dimension.

Your questions answered ^^ 101 layers stacked on top of each other

Early papers are fun to read