image

paper

TL;DR

  • task : stochastic DNN => image classification, reinforcement learning, adversarial example
  • idea : ํ‰๊ท ์ด ์•„๋‹ˆ๋ผ ๋ถ„์‚ฐ๋งŒ ํ•™์Šต๋˜๋Š” stochastic layer๋ฅผ ๋งŒ๋“ค์–ด๋ณผ๊นŒ?
  • architecture : LeNet-5-Caffe
  • objective : ๊ฐ ํƒœ์Šคํฌ์— ๋งž๋Š” objective
  • **baseline :**VGG-like architecture, Deterministic Policy
  • data : CIFAR-10, CIFAR-100
  • result : ?
  • contribution : ?
  • limitation or ์ดํ•ด ์•ˆ๋˜๋Š” ๋ถ€๋ถ„ : ๋‚˜์ค‘์— ์‹œ๊ฐ„ ๋งŽ์„ ๋•Œ ๋‹ค์‹œ ์ฝ์–ด์•ผ์ง€

Details

DNN in stochastic setting

  • stochastic layer, stochastic optimization texhinques ๋“ฑ์˜ ๋ฐฉ๋ฒ•์ด ์žˆ์Œ
  • reduce overfitting, estimate uncertainty, more efficient exploration for reinforcement learning์— ์“ฐ์ž„
  • stochastic model์„ ํ•™์Šตํ•˜๋Š”๊ฑด ์ผ์ข…์˜ Bayesian model๋กœ ํ•ด์„์ด ๊ฐ€๋Šฅํ•˜๋‹ค
  • ๊ทธ ์ค‘์— ํ•œ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์€ deterministic weight $w_{ij}$๋ฅผ $\hat w_{ij} \sim q(\hat w_{ij}|\phi_{ij})$๋กœ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿผ ํ•™์Šต ์ค‘์—๋Š” ์ด weight์— ๋Œ€ํ•œ single point estimation์ด ์•„๋‹ˆ๋ผ weight์˜ ๋ถ„ํฌ์— ๋Œ€ํ•ด ํ•™์Šต์„ ํ•˜๋Š” ํ˜•ํƒœ์ด๋‹ค.
  • ๊ทธ๋Ÿฌ๋‚˜ test์—๋Š” ๊ฒฐ๊ตญ ์ด weight์˜ ๋ถ„ํฌ์— ๋Œ€ํ•œ ํ‰๊ท ์„ ๋‚ด์„œ ์“ฐ๊ณ , ์ด ๊ณผ์ •์—์„œ “mean propagation”, “weight scaling rule"๊ฐ™์€ ๊ฒƒ๋“ค์ด ์“ฐ์ธ๋‹ค.

Stochastic Neural Network

DNN์€ ๊ฒฐ๊ตญ object x์™€ weights W๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ target T๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ. ์—ฌ๊ธฐ์„œ stochastic neural network ์ค‘ weight W๊ฐ€ parametric distribution $q(W|\phi)$๋กœ๋ถ€ํ„ฐ sampling๋˜๋Š” ๋ชจ๋ธ์„ ์ƒ์ •ํ•ด๋ณด์ž. ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋ฉด์„œ $\phi$๊ฐ€ training data (X, T)์— ์˜ํ•ด ํ•™์Šต๋˜๊ณ  regularization term $R(\phi)$๋„ ์ถ”๊ฐ€ ๋œ๋‹ค. ์ฆ‰ ์•„๋ž˜์™€ ๊ฐ™์ด ์“ธ ์ˆ˜ ์žˆ๋‹ค. image ์ด ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ๊ณผ์ •์—์„œ๋Š” binary dropout, variational dropout, dropout-connection ๊ฐ™์€ ๊ธฐ๋ฒ•์ด ์“ฐ์ด๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ ์ •ํ™•ํ•œ $E_{q(W|\phi)}p(t|x,W)$๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๋ณดํ†ต intractableํ•˜๋‹ค. ๊ทธ๋ž˜์„œ ๋ณดํ†ต K๊ฐœ์˜ sample์„ ๋ฝ‘์•„์„œ ํ‰๊ท ์„ ๋‚ด๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ทผ์‚ฌํ•˜๋Š”๋ฐ ์ด๋ฅผ “test-time averaing"์ด๋ผ๊ณ  ํ•œ๋‹ค. image ์กฐ๊ธˆ๋” ํšจ์œจ์ ์œผ๋กœ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด์„œ $\hat W_k$๊ฐ€ ์•„๋‹ˆ๋ผ $E_qW$๋ฅผ ๊ตฌํ•˜๋Š” ์‹์œผ๋กœ ํ•˜๋Š”๋ฐ ์ด๋ฅผ “weight scaling rule"์ด๋ผ๊ณ  ํ•œ๋‹ค. image ์—ฌ๊ธฐ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” $E_qW=0$์ธ ๋ ˆ์ด์–ด๋ฅผ ์ƒ์ •ํ•˜๋ ค๊ณ  ํ•˜๋Š”๋ฐ, ๊ทธ๋ ‡๊ฒŒ ๋˜๋ฉด p(t|x, EW=0)์ด๋ฏ€๋กœ ๋งค๋ฒˆ weight scaling rule์„ ์“ฐ๋ฉด random guessํ•˜๋Š” ๊ผด์ด ๋œ๋‹ค(๊ทธ๋ž˜์„œ ์–˜๋„ค๋Š” weight scaling rule์„ ์•ˆ์“ฐ๊ฒ ์ง€?). ํ‰๊ท ๊ฐ’์—๋Š” ์ •๋ณด๊ฐ€ ์—†๊ณ  ๋ถ„์‚ฐ์—๋งŒ ์ •๋ณด๊ฐ€ ์ €์žฅ๋˜๋ฏ€๋กœ ์ด๋Ÿฌํ•œ ๋ ˆ์ด์–ด๋ฅผ “variance layers"๋กœ ํ•˜๊ณ  “variance network"๋กœ ์ •์˜ํ•˜๊ณ ์ž ํ•œ๋‹ค.

Variance Layer

activation๋˜๋Š”๊ฒŒ $\mu_{ij}$์—๋Š” ์˜์กดํ•˜์ง€ ์•Š๊ณ  variance์—๋งŒ ์˜์กดํ•˜์—ฌ ๋œ๋‹ค. image

Result

classification / reinforcement learning / adversarial example์—์„œ ์ข‹์€ ๊ฒฐ๊ณผ image image image