[77] Interpretable Image Classification with Differentiable Prototype Assignment

TL;DR

task : case-based reasoning
problem : The existing ProtoPNet assumes a prototype for each class, and since the optimization is divided into multiple steps and looks at the absence of a prototype when deciding on a class, the prototype becomes vague.
idea : 1) allow prototypes to be shared between classes 2) make it differentiable to assign a prototype to a class 3) define a focal similarity function so that backgrounds, etc. are not made from prototypes
architecture : Looks exactly like ProtoPNet. CNN => get focal similarity with prototype => soft assign with gumbel softmax => pooling => classifier
objective : CE loss(h learning) + orthogonality loss
baseline : ProtoPNet
data : CUB-200-2011, Standford Cars Data
result : SOTA, capture more salient features
contribution : reduce prototypes
limitation or part I don’t understand : fig 3 doesn’t make sense to me exactly… prototype pool is shared but there are separate slots for each class and those prototypes are categorized?

Details

Preliminaries: ProtoPNet(prototypical part network)

This Looks Like That: Deep Learning for Interpretable Image Recognition I want to visualize why this image was categorized into this class.

Given an image x, draw f(x) with a CNN and get H x W x D as CNN output At the same time, m prototypes have $H_1$ x $W_1$ x D shape, which must be smaller than H and W. In this case, the D dimensions are the same, but the height and width are smaller, so each prototype can be used like a CNN patch to get an activation map

The overall ProtoPNet structure is the same as above. Learning is divided into three phases (1) Stochastice gradient descent(SGD) of layers before last layer learns a prototype P and a convolution filter. loss learns the minimum distance between the last classification loss and the patches in the prototype and convolution output to be closer if they are of the same class and farther if they are of different classes.

(2) Projection of prototypes Assigns prototype to be the closest patch in the same class as the prototype

(3) Convex optimization of last layer Freeze the prototype and CNN and learn the matrix for h.

The logic behind the model’s classification
How it differs from CAM or partial attention

motivation

Architecture

Each class has K slots, so you can assign shared prototypes to them

Given an image x, draw the output H x W x D with CNN(=f(x)) This can be interpreted as having H x W vectors in D dimensions. We can assign those D-dimensional vectors to the kth slot by finding their similarity

Focal similarity

Previous studies, such as ProtoPNet, have obtained similarity as follows

However, this can lead to (1) all the patches of f(x), z, being prototypically similar, i.e., focusing only on the background, and (2) gradienting only the activated elements in the image. To avoid this, we propose focal similarity

Assigning one prototype per slot

gumbel-softmax to soften the prototype instead of hard assigning it and allow the gradient to flow

Add more LOS to prevent multiple prototypes from fitting in one slot.

TL;DR#

Details#

Preliminaries: ProtoPNet(prototypical part network)#

motivation#

Architecture#

Focal similarity#

Assigning one prototype per slot#