[53] InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

TL;DR

task : unsupervised learning
problem : unsupervised manner로 representation learning을 하고 싶은데, 이 때 중요한 특성들(숫자, 눈의 색)을 disentangle을 하고 싶다. generative model의 경우 완벽한 생성을 하지만 representation은 엉망인 경우가 많다.
idea : 어떤 structured latent variable $c$와 generator distribution $G(z, c)$의 mutual information(=MI)이 높도록 loss에 추가. MI는 ELBO처럼 lower bound가 생기고 이때 posterior는 neural network로 근사함.
architecture : generative model은 DCGAN, CNN을 share하는데 위에 FCN 하나 더 붙여서 $Q(c|x)$가 나오도록 함.
objective : GAN loss - mutual information loss
baseline : vanilla GAN
data : MNIST, DC-IGN, Street View House Number(SVHN), CelebA
result : code를 바꿔가면 생성물도 해석가능하게 바뀌는것을 확인. 그냥 GAN을 c에 대해 학습하도록 하면 mutual information이 InfoGAN만큼 최대화되지는 않음.
contribution : GAN with interpretable latent vector!
limitation or 이해 안되는 부분 :

category c를 넣어줄 때 랜덤으로 하는데 어떻게 하나의 인덱스가 하나의 digit과 관련을 가질 수 있는걸까? 예를 들어서 1 이미지 들어왔을 때 c가 3번일때도 5번일때도 똑같이 그걸 복원하고, 반대로 1이 들어올 떄나 2가 들어올때나 c가 5번일수도 있잖슴.. 어쨌든 c를 고려한 generation이 되면서 가능한건가?

-> GAN이라 이미지가 ‘1’로 들어갔다는건 없음! 즉, VAE 처럼 Reconstruct하는게 아니라 주어진 이미지가 fake인지 real인지 구분하면서 학습되는거임! 그러므로 어떤 latent code c가 3으로 들어갔으면 3같은 그림이 나오도록 mutual information을 넣어주는 듯. 즉 걱정하는 상황은 없는듯.