image

paper

TL;DR

  • task : generative model
  • problem : generative model์€ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ์ถ”์ •ํ•ด์„œ sampling์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ. ์ด๋•Œ pdf๋Š” ์ ๋ถ„์ด 1์ด ๋˜์–ด์•ผํ•œ๋‹ค๋Š” ๊ฒƒ ๋•Œ๋ฌธ์— ๊ตฌํ•˜๊ธฐ๊ฐ€ ์–ด๋ ค์›€. ๊ทธ๋ž˜์„œ pdf์˜ ์ถ”์ • ์—†์ด ๋ฐ”๋กœ log p(x)๋ฅผ x๋กœ ๋ฏธ๋ถ„ํ•œ score๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ด score matching ๋ฐฉ๋ฒ•! ์ด๋•Œ score matching ๋ฐฉ๋ฒ•์€ low-dimensional manifold์—์„œ score๊ฐ€ ์ •์˜๋˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ๋ฌธ์ œ์ž„.
  • idea : gaussian noise๋ฅผ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์—ฌ๋Ÿฌ๋ฒˆ ์ถ”๊ฐ€ํ•˜๊ณ  ๊ฐ noise level์„ ํ•˜๋‚˜์˜ conditional score network๋กœ ํ•™์Šตํ•˜์ž. sampling์€ langevin dynamic sampling(x์˜ ๋ฏธ๋ถ„๊ฐ’์„ iterativeํ•˜๊ฒŒ ํ•˜๋ฉด ์›๋ž˜ x๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค)์„ ํ†ตํ•ด ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • architecture : U-Net
  • objective : ์šฐ๋ฆฌ๊ฐ€ ์ถ”์ •ํ•œ score network $s_\theta$์— x์— ๊ฐ€์šฐ์‹œ์•ˆ noise๋ฅผ ์ถ”๊ฐ€ํ•œ $\tilde x$๋ฅผ ๋„ฃ์—ˆ์„ ๋•Œ์˜ output๊ณผ ์šฐ๋ฆฌ๊ฐ€ ์ถ”๊ฐ€ํ•œ noise distribution์˜ score ์ฐจ์ด
  • baseline : PixelCNN, WGAN, BigGAN
  • data : CIFAR10, MNIST, CelebA
  • result : InCeption, FID์—์„œ ์ค€์ˆ˜ํ•œ ์„ฑ์ .(BigGAN, MoLM ๋ณด๋‹ค ์กฐ๊ธˆ ๋‚ฎ์Œ)
  • contribution : score based model w/o any sampling or adversarial training
  • limitation or ์ดํ•ด ์•ˆ๋˜๋Š” ๋ถ€๋ถ„ : sliced score matching

Details