image

paper

Details

Multi-task Learning

์™œ ์ž˜๋˜๋Š”๊ฐ€?

  1. ํ•œ ํƒœ์Šคํฌ์— ๋Œ€ํ•œ ์˜ค๋ฒ„ํ”ผํŒ…์„ ๋ง‰์œผ๋ฉฐ 2) ๋ฐ์ดํ„ฐ ์–ด๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ํšจ๊ณผ 3) “inductive bias"๋ฅผ ํ•™์Šต 4) ์ข‹์€ feature๋ฅผ ํ•™์Šต

hard parameter sharing vs soft parameter sharing

  • hard parameter sharing image

๋ณดํ†ต ์ƒ๊ฐํ•˜๋Š” MTL ๋ชจ๋ธ ๊ตฌ์กฐ

  • soft parameter sharing image

๊ฐ๊ฐ ํƒœ์Šคํฌ์— ๋งž๋Š” ๋„คํŠธ์›Œํฌ๋ฅผ ์Œ“๊ณ  ๊ฐ ๋„คํŠธ์›Œํฌ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๋‹ฌ๋ผ์ง€์ง€ ์•Š๋„๋ก L2 norm loss๋ฅผ ๋ถ€๊ณผ

Recent work on MTL for deep learning

  • Deep Relationship Networks FCN์— matrix prior๋ฅผ ๋ถ€๊ณผํ•ด์„œ ๋ชจ๋ธ์ด ํƒœ์Šคํฌ ๊ฐ„์˜ relationship์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•จ image

  • Cross-stitch network image

ํƒœ์Šคํฌ๋ณ„๋กœ ๋ณ„๋„์˜ ๋„คํŠธ์›Œํฌ๊ฐ€ ์žˆ๊ณ  ๊ฐ ๋„คํŠธ์›Œํฌ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•™์Šต ๊ฐ€๋Šฅํ•œ $\alpha$๋งŒํผ linear combination ๋˜๋„๋ก

  • Weighting losses with uncertainty image

๊ฐ task์˜ Uncertainty๋ฅผ ์ธก์ •ํ•˜๊ณ  multi-task loss function์— ์ƒ๋Œ€์ ์ธ weight ์ถ”๊ฐ€ -> ์ด๊ฑฐ ์ฝ์œผ๋ฉด ์ข‹์„๋“ฏ!

Auxiliary tasks

  • related task ๊ด€๋ จ ์žˆ๋Š” ํƒœ์Šคํฌ๋ฉด ๋” ์ข‹์Œ
  • adversarial ๊ฐ–๊ณ  ์‹ถ์€ ๊ฒƒ์˜ ๋ฐ˜๋Œ€๋ฅผ ํ†ตํ•ด์„œ ํ•™์Šต. ๊ฐ€๋ น Domain adaptation์—์„œ ์ธํ’‹์˜ ๋„๋ฉ”์ธ์„ ์˜ˆ์ธกํ•˜๊ณ  adversarial task์˜ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ Reverseํ•ด์„œ ์‚ฌ์šฉํ•˜๋Š” ์—ฐ๊ตฌ? Ganin, 2015
  • ํžŒํŠธ ์กฐ๊ธˆ ๋” ์‰ฌ์šด ํƒœ์Šคํฌ๋ฅผ ์‚ฌ์šฉ. ๊ฐ€๋ น ๋ฌธ์žฅ์˜ ๊ฐ์ • ์˜ˆ์ธก์„ ํ•˜๋Š” ํƒœ์Šคํฌ๋ฅผ ๊ธ์ •/๋ถ€์ •์œผ๋กœ ๋‚˜๋ˆ ์„œ ํ•™์Šต -> connectivity ์‹คํ—˜ ์ƒ๊ฐ๋‚˜๋„น!
  • Representation learning ๊ฒฐ๊ตญ ์ข‹์€ ํ‘œํ˜„์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•จ์ด๋‹ˆ Representation์„ ์ž˜๋งŒ๋“œ๋Š” ๊ฒƒ๋„ auxiliary task๊ฐ€ ๋  ์ˆ˜ ์žˆ์Œ. ๊ฐ€๋ น language modeling์ด๋‚˜ autoencoder๊ฐ€ ๊ทธ ์˜ˆ์‹œ.

๊ฑ ๋А๋‚€ ์ 

BERT๊ฐ€ ์ •๋ง ํŒŒ๊ดด์ ์ด๊ตฌ๋‚˜ ๋А๋‚Œ ใ…‹ใ…‹