image

paper

TL;DR

  • task : unsupervised learning
  • problem : unsupervised manner๋กœ representation learning์„ ํ•˜๊ณ  ์‹ถ์€๋ฐ, ์ด ๋•Œ ์ค‘์š”ํ•œ ํŠน์„ฑ๋“ค(์ˆซ์ž, ๋ˆˆ์˜ ์ƒ‰)์„ disentangle์„ ํ•˜๊ณ  ์‹ถ๋‹ค. generative model์˜ ๊ฒฝ์šฐ ์™„๋ฒฝํ•œ ์ƒ์„ฑ์„ ํ•˜์ง€๋งŒ representation์€ ์—‰๋ง์ธ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค.
  • idea : ์–ด๋–ค structured latent variable $c$์™€ generator distribution $G(z, c)$์˜ mutual information(=MI)์ด ๋†’๋„๋ก loss์— ์ถ”๊ฐ€. MI๋Š” ELBO์ฒ˜๋Ÿผ lower bound๊ฐ€ ์ƒ๊ธฐ๊ณ  ์ด๋•Œ posterior๋Š” neural network๋กœ ๊ทผ์‚ฌํ•จ.
  • architecture : generative model์€ DCGAN, CNN์„ shareํ•˜๋Š”๋ฐ ์œ„์— FCN ํ•˜๋‚˜ ๋” ๋ถ™์—ฌ์„œ $Q(c|x)$๊ฐ€ ๋‚˜์˜ค๋„๋ก ํ•จ.
  • objective : GAN loss - mutual information loss
  • baseline : vanilla GAN
  • data : MNIST, DC-IGN, Street View House Number(SVHN), CelebA
  • result : code๋ฅผ ๋ฐ”๊ฟ”๊ฐ€๋ฉด ์ƒ์„ฑ๋ฌผ๋„ ํ•ด์„๊ฐ€๋Šฅํ•˜๊ฒŒ ๋ฐ”๋€Œ๋Š”๊ฒƒ์„ ํ™•์ธ. ๊ทธ๋ƒฅ GAN์„ c์— ๋Œ€ํ•ด ํ•™์Šตํ•˜๋„๋ก ํ•˜๋ฉด mutual information์ด InfoGAN๋งŒํผ ์ตœ๋Œ€ํ™”๋˜์ง€๋Š” ์•Š์Œ.
  • contribution : GAN with interpretable latent vector!
  • limitation or ์ดํ•ด ์•ˆ๋˜๋Š” ๋ถ€๋ถ„ :
  1. category c๋ฅผ ๋„ฃ์–ด์ค„ ๋•Œ ๋žœ๋ค์œผ๋กœ ํ•˜๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ํ•˜๋‚˜์˜ ์ธ๋ฑ์Šค๊ฐ€ ํ•˜๋‚˜์˜ digit๊ณผ ๊ด€๋ จ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š”๊ฑธ๊นŒ? ์˜ˆ๋ฅผ ๋“ค์–ด์„œ 1 ์ด๋ฏธ์ง€ ๋“ค์–ด์™”์„ ๋•Œ c๊ฐ€ 3๋ฒˆ์ผ๋•Œ๋„ 5๋ฒˆ์ผ๋•Œ๋„ ๋˜‘๊ฐ™์ด ๊ทธ๊ฑธ ๋ณต์›ํ•˜๊ณ , ๋ฐ˜๋Œ€๋กœ 1์ด ๋“ค์–ด์˜ฌ ๋–„๋‚˜ 2๊ฐ€ ๋“ค์–ด์˜ฌ๋•Œ๋‚˜ c๊ฐ€ 5๋ฒˆ์ผ์ˆ˜๋„ ์žˆ์ž–์Šด.. ์–ด์จŒ๋“  c๋ฅผ ๊ณ ๋ คํ•œ generation์ด ๋˜๋ฉด์„œ ๊ฐ€๋Šฅํ•œ๊ฑด๊ฐ€?

-> GAN์ด๋ผ ์ด๋ฏธ์ง€๊ฐ€ ‘1’๋กœ ๋“ค์–ด๊ฐ”๋‹ค๋Š”๊ฑด ์—†์Œ! ์ฆ‰, VAE ์ฒ˜๋Ÿผ Reconstructํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€๊ฐ€ fake์ธ์ง€ real์ธ์ง€ ๊ตฌ๋ถ„ํ•˜๋ฉด์„œ ํ•™์Šต๋˜๋Š”๊ฑฐ์ž„! ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์–ด๋–ค latent code c๊ฐ€ 3์œผ๋กœ ๋“ค์–ด๊ฐ”์œผ๋ฉด 3๊ฐ™์€ ๊ทธ๋ฆผ์ด ๋‚˜์˜ค๋„๋ก mutual information์„ ๋„ฃ์–ด์ฃผ๋Š” ๋“ฏ. ์ฆ‰ ๊ฑฑ์ •ํ•˜๋Š” ์ƒํ™ฉ์€ ์—†๋Š”๋“ฏ.

  1. c๋กœ ๋“ค์–ด๊ฐ€๋Š” category์™€ continuous์˜ ๊ฐœ์ˆ˜๋ฅผ ์ •ํ• ์ˆ˜๋Š” ์žˆ์ง€๋งŒ ์• ์ดˆ์— ๊ฐ๊ฐ์ด ๋ญ˜ ๋ฐฐ์šธ์ง€๋Š” ์ •ํ•  ์ˆ˜ ์—†๋Š”๊ฑฐ ์•„๋‹Œ๊ฐ€?? ์™œ ์ •ํ•  ์ˆ˜ ์žˆ๋Š”๊ฒƒ์ฒ˜๋Ÿผ ํ•ด๋†จ์ง€?? ์‚ฌํ›„์ ์œผ๋กœ ์•Œ๊ฒŒ๋˜๋Š”๊ฒŒ ์•„๋‹Œ์ง€.โ€ฆ

-> ์ •ํ•  ์ˆ˜ ์—†๋Š”๊ฒŒ ๋งž๋Š”๋“ฏ. ๊ฒฐ๊ณผ๋ก ์ ์œผ๋กœ ํ•ด์„ํ–ˆ์„ ๋•Œ ์šฐ๋ฆฌ๊ฐ€ ์ƒ๊ฐํ•˜๋Š” feature๋“ค์„ code๋“ค์ด ์ž˜ ๋‹ด๊ณ  ์žˆ๋‹ค๊ณ  ์ฐ์„ ํ‘ผ๊ฑฐ ์ธ๋“ฏ.

Details

mutual information

image

X์™€ Y๊ฐ€ ๋…๋ฆฝ์ด์–ด์„œ $P_{X,Y}(x,y)=P_X(x)P_Y(y)$๋ฉด, image

์—”ํŠธ๋กœํ”ผ์— ๋Œ€ํ•œ ์‹์œผ๋กœ ์“ฐ๋ฉด image

Variatitonal Mutual Information Maximization

image

์—ฌ๊ธฐ์„œ posterior Q์— ๋Œ€ํ•ด sample์„ ๋ฝ‘์•„์•ผ ํ•˜๋Š” ๋ถ€๋ถ„์ด ์žˆ๋Š”๋ฐ ์•„๋ž˜ lemma๋ฅผ ํ†ตํ•ด sample๋„ ์•ˆํ•ด๋„ ๋จ.

image

ํ•ด์„ํ•˜์ž๋ฉด ์–ด๋–ค ํ•จ์ˆ˜ f(x, y)๋ฅผ x์™€ x๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ์˜ y์— ๋Œ€ํ•ด ๊ธฐ๋Œ€๊ฐ’์„ ๊ตฌํ•˜๋ฉด x์™€ x๊ฐ€ ์ฃผ์–ด์กŒ์„๋•Œ y์™€, x’(y๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ์˜ x)์— ๋Œ€ํ•ด f(x’ y)๊ธฐ๋Œ€๊ฐ’์„ ๊ตฌํ•œ ๊ฒƒ๊ณผ ๊ฐ™๋‹ค.

์šฐ๋ฆฌ์˜ lower bound๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋จ image

์ตœ์ข…์ ์ธ loss๋Š” GAN loss์— mutual information lower bound๋ฅผ ๋บ€ ๊ฒƒ! (MI๋Š” ๋†’์„ ์ˆ˜๋ก ์ข‹์Œ) image