image

paper

TL;DR

  • task : unsupervised learning
  • problem : I want to do representation learning in an unsupervised manner, where I want to disentangle important features (numbers, eye color). Generative models are often perfect for generating the model, but the representation is a mess.
  • idea : Add to the loss such that the mutual information (=MI) of some structured latent variable $c$ and generator distribution $G(z, c)$ is high. MI has a lower bound like ELBO, where the posterior is approximated by a neural network.
  • architecture : The generative model shares a DCGAN and a CNN, with one more FCN on top to give $Q(c|x)$.
  • objective : GAN loss - mutual information loss
  • baseline : vanilla GAN
  • data : MNIST, DC-IGN, Street View House Number(SVHN), CelebA
  • result : By changing the code, we see that the output is also interpretable. Simply letting GAN learn about c does not maximize mutual information as much as InfoGAN.
  • contribution : GAN with interpretable latent vector!
  • Limitations or things I don’t understand :
  1. how can one index be related to one digit if it’s randomized when putting in category c. For example, if 1 image comes in and c is 3 or 5, it restores it the same whether c is 3 or 5, and vice versa, if 1 comes in or 2 comes in and c is 5, is it possible because it’s a generation that considers c anyway?

-> In other words, it does not reconstruct like VAE, but it learns while distinguishing whether a given image is fake or real. Therefore, if some latent code c is entered as 3, it seems to put mutual information so that a picture like 3 comes out.

  1. You can decide how many categories and continuations go into c, but you can’t decide what each one learns in the first place, right?? Why is it done as if you can?? It’s not something you find out after the fact….

-> undecidable is right. As if the resultant interpretation is that the code is a good representation of the features we think we have.

Details

mutual information

image

If X and Y are independent so that $P_{X,Y}(x,y)=P_X(x)P_Y(y)$, image

When written as an expression for entropy image

Variatitonal Mutual Information Maximization

image

Here is where we need to sample for posterior Q, but the lemma below shows that we don’t even need to sample.

image

Interpreted, finding the expected value of a function f(x, y) given x and y given x is equivalent to finding the expected value of f(x’ y) given x and y given x, and f(x’ y) given x’ (x given y).

Our lower bound is defined as image

The final loss is the GAN loss minus the mutual information lower bound (the higher the MI, the better). image