image

paper

TL;DR

  • I read this because.. : efficient finetuning ์‹œ๋ฆฌ์ฆˆ ๋ฌผ
  • task : LLM finetuning
  • problem : finetuning ๋‹ค ํ•˜๋Š”๊ฑฐ ๋น„ํšจ์œจ์ . discrete prompt ์ฐพ๊ธฐ ๊ณ„์‚ฐ ๋น„ํšจ์œจ์ .
  • idea : continuousํ•œ prompt๋ฅผ ์•ž์— ๋ถ™์ด์ž.
  • architecture : BART, GPT-2
  • objective : ce loss
  • baseline : finetuning, finetuning top 2 layer, apdapter
  • data : E2E, WebNLG, DART
  • result : finetuning ๋ณด๋‹ค๋Š” ์‚ด์ง ๋‚ฎ๊ณ  adapter๋‚˜ ft-top2๋ณด๋‹จ ์กฐ๊ธˆ ๋‚˜์€ ์„ฑ๋Šฅ
  • contribution : #113 ๋ž‘ ๋น„์Šทํ•œ ์•„์ด๋””์–ด

Details

image

PLM์ด ๋”ฐ๋กœ ์žˆ๊ณ  prefix๋ฅผ ์œ„ํ•œ hidden ์ฐจ์›์˜ matrix $P_\theta $๊ฐ€ ์žˆ๋Š” ํ˜•ํƒœ image

image

smaller matrix $P_\theta ‘$์—์„œ ์‹œ์ž‘ํ•ด์„œ MLP๋กœ size ํ‚ค์šฐ๋Š”๊ฒŒ ๋” ์„ฑ๋Šฅ์ด ์ข‹์•˜๋‹ค. ํ•™์Šตํ•˜๊ณ  ๋‚˜์„œ๋Š” $P_\theta ‘$์—†์ด ๋ฐ”๋กœ prefix $P_\theta $๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค

Results

image

Ablations

  • low data ์ƒํ™ฉ์ผ ๋•Œ random initalize๋ณด๋‹ค real word๋กœ initํ•˜๋Š”๊ฒŒ ์ข‹์•˜๋‹ค. image

ํƒœ์Šคํฌ์™€ ๊ด€๋ จ ์—†๋Š” “elephant” ๊ฐ™์€ ๊ฒƒ๋„ random ๋ณด๋‹ค ๋‚˜์•˜๋‹ค. full์ผ๋•Œ๋Š” Initialize์— ํฌ๊ฒŒ ์˜ํ–ฅ ๋ฐ›์ง€ ์•Š์•˜๋‹ค.

  • prompt ๊ธธ์ด๋Š” task ๋งˆ๋‹ค ์„ฑ๋Šฅ์˜ ์ƒํ–ฅ์„ ์ด ์žˆ์—ˆ๋‹ค ์š”์•ฝ์€ 200 / table to text๋Š” 10 image

  • prompt๋ฅผ ์•ž์— ๋‘๋Š” prefix ํ˜•ํƒœ๊ฐ€ $[x; prompt; y]$ ํ˜•ํƒœ์ธ infix๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์ข‹์•˜๋‹ค. image