[146] Transformer Interpretability Beyond Attention Visualization

TL;DR

I read this because.. : a.k.a TiBA. interested in explainable CLIP scores. read as preliminary
task : interpertability of neural network
problem : Applying the existing Layer-wise Relevance Propagation (LRP) method to a transformer requires (1) skip-connection and (2) ReLU is not used in activation, resulting in a negative result.
Idea :** (1) change it so that both positive and negative are interchangeable, (2) add a normalization term, and (3) combine the attention and relevancy scores to get a score.
input/output : image -> class // heatmap in image
architecture : ViT-B, BERT
baseline : rollout, raw attention, GradCAM, LRP, partial LRP
data : ImageNet 2012, ImageNext-Segmentation, Movie Reviews
evaluation : AUC(perturbation tests), pixel accuray / mAP / mIoU(segmentation), token-F1(Movie Reviews)
result : Good performance compared to before
contribution : explainabity in transformer is slow with attention flow
etc. : When I read it, I recognized the person who explained it in the CVPR tutorial in the past! Also, when I looked for the paper, there was Ms. Kim who was in the XAI side, but she was a Korean woman, so I was glad to see her.

Details

Method

The goal is to get the LRP-based relevance for each attention head in the transformer and combine it with the gradient for class-specific visualization.

Relevance and gradients

The gradient of $y_{t}$ (the output of the model for class t) over $x_j^{(n)}$, index j of input x of the nth layer by the chain rule, is defined as follows

If we define $L^{(n)}(X, Y)$ as a layer operation on two tensors X, Y, then the two tensors become feature map / weight, and if we subject them to the Deep Taylor Decomposition, the relevance is obtained as follows.

deep taylor decomposition uses taylor approximation to find relevance http://arxiv.org/abs/1512.02479

Approximating low output with a gradient~ Understanding only to a point

The conservative rule dictates that the sum of the relevance of the nth layer and the sum of the relevance of the (n - 1)th layer must be equal.

This also comes from the paper above, and means that f(x) equals the sum of relevance.

The LRP paper assumes ReLU as activation, so we only see positive values

$v^+$ : max(0, v)

But if you use something like GeLU, you can get negative results. So I changed this to ask for kids that are just a postivie subset (…? I don’t know what difference this makes)

And set the very first initialization to a one-hot vector for class t

non parametric relevance propagation

transformer has two operations that mix two feature map tensors (<=> before it was weight and feature map, which is different) (1) skip connection (2) matrix multiplication

Given two tensors u, v, we define the two operators as follows

relevance score for leading u / relevance score for trailing v relevance score tended to be too large on skip connections. To fix this, we added normalization

Relevance and gradient diffusion

If you use the above procedure for the self-attention operation, you will get the following result

$A^{(b)}$ : attention map of the bth block $E_h$: Average for heads dimension Leaves only the positive part of the gradient.

For rollout, it simply iteratively multiplies over the attention map

Obtaining the image relevance map

The result is a matrix C of s x s. Each row shows how that token relates to other tokens in the relevance map Since we only focused on the classification model in this study, we only get the relevance score for [CLS]' token For ViT, the sequence length s is subtracted from [CLS]` and resized as $\sqrt{s-1} \times \sqrt{s-1}$ to resize and then interpolate to get

Result

qualitative
pertubation

See how the top-1 accuracy changes by masking out the ones you said were important, with positive being a gradual erasure of the important ones (lower is better).

segmentation
token-f1

TL;DR#

Details#

Method#

Relevance and gradients#

non parametric relevance propagation#

Relevance and gradient diffusion#

Obtaining the image relevance map#

Result#