image

paper , code

TL;DR

problem : few-shot classification์€ ๊ฐ™์€ domain(=ImageNet ๋‚ด์—์„œ unseen label์„ ์˜ˆ์ธก)์—์„œ๋Š” ์ž˜ ์ž‘๋™๋˜์ง€๋งŒ, ๋‹ค๋ฅธ ๋„๋ฉ”์ธ์œผ๋กœ few-shot์„ ํ•  ๊ฒฝ์šฐ ์ž˜ ์ž‘๋™๋˜์ง€ ์•Š์Œ(ImageNet์œผ๋กœ ํ›ˆ๋ จ๋œ๊ฒŒ CUB ๋ฐ์ดํ„ฐ๋กœ few-shot test๋ฅผ ํ•˜๋ฉด ์ž˜ ์•ˆ๋‚˜์˜ด) solution : feature encoder์— feature-wise transformation layer(affine ๋ณ€ํ™˜)๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€๊ณ  ์ด๋•Œ์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” learning-to-learn ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ํ•™์Šต๋จ. result : MatchingNet, RelationNet, Graph Neural Network์— ์œ„์˜ feature-wise transformation์„ ์ ์šฉํ–ˆ์„ ๋•Œ generalization ์„ฑ๋Šฅ์ด ์ข‹์•˜์Œ.

details

  • domain adaption / generalization์˜ ์ฐจ์ด๋Š” generalization์˜ ๊ฒฝ์šฐ ํ•™์Šต ๋‹จ๊ณ„์—์„œ unseen domain์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  generalize ํ•ด์•ผ ํ•จ.
  • ์šฐ๋ฆฌ๋Š” domain generalization ๋ฌธ์ œ๋ฅผ few-shot ์…‹ํŒ…์—์„œ novelํ•œ ํด๋ž˜์Šค๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ๋กœ ๋ฐ”๊ฟˆ.

3.1. Preliminaries

  • few-shot terms

    • N_w : # of categories
    • N_s : # of labeled examples for each categories
  • ์•„๋ž˜ ๊ทธ๋ฆผ์€ 3 way 3 shot few shot์˜ ์˜ˆ์‹œ image

  • metric-based์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ feature encoder E์™€ metric function M์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Œ.

  • ๊ฐ iteration์—์„œ N_w๊ฐœ์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ๋ฝ‘๊ณ  task T๋ฅผ ๋งŒ๋“ ๋‹ค. input image๋ฅผ X, ์ด์— ํ•ด๋‹นํ•˜๋Š” label์„ Y๋ผ๊ณ  ํ•˜๊ณ . task T๋Š” support set์ธ S={(X_s, T_s)}์™€ query set์ธ {(X_q, Y_q)}๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

  • feature encoder E๋Š” support์™€ query ์ด๋ฏธ์ง€์˜ feature๋ฅผ ๋ฝ‘๊ณ  metric function M์— ๋„ฃ์–ด support image์˜ label์„ ์ฐธ๊ณ ํ•˜์—ฌ query image์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. image

  • ํ•™์Šต ๋ชฉํ‘œ ํ•จ์ˆ˜๋Š” query set์— ๋Œ€ํ•œ ์ด๋ฏธ์ง€์˜ classification loss์ด๋‹ค. image

  • ๋‹ค์–‘ํ•œ metric-based ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ฃผ์š”ํ•œ ์ฐจ์ด์ ์€ ์ด๋ฏธ์ง€ ํ”ผ์ณ๋ฅผ ๋ฝ‘๋Š” E์˜ ์•„ํ‚คํ…์ณ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง„๋‹ค. ๊ฐ€๋ น MatchingNet์€ LSTM, RelationNet์€ CNN, GNN์€ GCN

  • ํ•™์Šตํ•  ๋•Œ seen domain๋“ค๋กœ trainingํ•˜๊ณ  ํ‰๊ฐ€๋Š” unseen domain์— ๋Œ€ํ•˜์—ฌ ํ•˜์˜€๋‹ค.

3.2. feature-wise transformation layer

image
  • ์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” unseen ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด generalization์„ ๋” ์ž˜ํ•˜๊ธฐ ์œ„ํ•จ์ธ๋ฐ, metric function M์ด seen domain์— overfitting๋˜๊ธฐ ์‰ฌ์šฐ๋ฏ€๋กœ ์ด๋ฅผ ๋ง‰์•„์ค˜์•ผํ•œ๋‹ค.

  • ์ง๊ด€์ ์œผ๋กœ, feature encoder E์— affine transformation์„ ์ ์šฉํ•˜๋ฉด ๋” ๋‹ค์–‘ํ•œ ๋ถ„ํฌ๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™๋‹ค.

  • hyper parameter๋Š” ์•„ํ•€ ๋ณ€ํ™˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ sampling ํ•˜๊ธฐ ์œ„ํ•œ standard dev๋ฅผ ๋œปํ•œ๋‹ค. image

  • batch norm ์ดํ›„์— ์•„๋ž˜์˜ feature-wise transformation layer๋ฅผ ์ ์šฉํ•œ๋‹ค. image

3.3 Learning the feature-wise transformation layers(=FT layer)

  • ์œ„์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฒฝํ—˜์ ์œผ๋กœ ์„ ํƒํ•  ์ˆ˜๋„ ์žˆ๊ฒ ์ง€๋งŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ณ  ์‹ถ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด๋ฅผ ์œ„ํ•ด learning-to-learn ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋””์ž์ธํ–ˆ๋‹ค. ์ฃผ์š” ์•„์ด๋””์–ด๋Š” FT ๋ ˆ์ด์–ด๋ฅผ ์ ์šฉํ•˜์—ฌ seen domain์— ๋Œ€ํ•ด์„œ ํ•™์Šตํ•œ ๊ฒƒ์ด unseen domain์— ๋Œ€ํ•ด์„œ๋„ ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. image

๊ฐ training iter t ์—์„œ seen domain ์ค‘ samplingํ•ด์„œ pseudo-seen domain(ps)๊ณผ pseudo-unseen domain(pu)๋ฅผ ๋งŒ๋“ ๋‹ค. FT layer์— ๋Œ€ํ•ด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ ์šฉํ•˜์—ฌ feature encoder์™€ metric function์„ ์ ์šฉํ•˜๊ณ  seen domain task์— ๋Œ€ํ•ด์„œ๋งŒ loss๋ฅผ ๊ตฌํ•œ๋‹ค. image

generalization์„ ์ธก์ •ํ•˜๋Š” ๋‹จ๊ณ„์—์„œ๋Š” 1) ๋ชจ๋ธ์˜ FT ๋ ˆ์ด์–ด๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  2) pseudo-unseen task์— ๋Œ€ํ•ด์„œ ์—…๋ฐ์ดํŠธ ๋œ ๋ชจ๋ธ์˜ classification loss๋ฅผ ๊ตฌํ•˜์—ฌ ๊ณ„์‚ฐํ•œ๋‹ค. ์ฆ‰, image

๋งˆ์ง€๋ง‰์œผ๋กœ, ์œ„์˜ loss๋Š” FT ๋ ˆ์ด์–ด์˜ ํšจ์œจ์„ฑ์„ ๋ฐ˜์˜ํ•˜๋ฏ€๋กœ, ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์—…๋ฐ์ดํŠธ ํ•œ๋‹ค. image

์ฆ‰ metric-based model๊ณผ feature-wise transformation layer(FT)๋Š” ํ•™์Šต๋‹จ๊ณ„์—์„œ ํ•จ๊ป˜ ํ•™์Šต๋œ๋‹ค.

Experimental Results

  • FT : ๊ฒฝํ—˜์ƒ ๊ณ ๋ฅธ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋กœ FT layer ์„ค์ •ํ–ˆ์„ ๋•Œ image

  • LFT : FT ๋ ˆ์ด์–ด์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•™์Šต ๊ฐ€๋Šฅํ•  ๋•Œ, image

  • ๋„๋ฉ”์ธ๋ณ„ tSNE ๊ฒฐ๊ณผ. ๋„๋ฉ”์ธ๋ผ๋ฆฌ ์ž˜ ์„ž์—ฌ์žˆ์Œ -> cross-domain adapt๋ฅผ ์ž˜ํ•  ์ˆ˜ ์žˆ์Œ. image