image

paper

TL;DR

  • I read this because.. : ํ—ˆ๊น…ํŽ˜์ด์Šค์˜ parameter efficient finetuning ๋ ˆํฌ์˜ article ๋ณด๋‹ค๊ฐ€ ์ฝ์Œ. p-tuning ๋งŽ์ด ๋“ค์–ด๋ดค๋Š”๋ฐ ์ฝ์–ด๋ณธ์ ์ด ์—†์—ˆ์Œ
  • task : language model finetuning(Knowledge probing, …)
  • problem : LLM์„ finetuning ํ•  ๋•Œ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ์ปค์„œ few-shot ์…‹ํŒ…์ด๋‚˜ many-shot setting์ด๋‚˜ trasnfer ๋Šฅ๋ ฅ์ด ๋–จ์–ด์ง„๋‹ค. GPT-3 ์—๋‹ค๊ฐ€ ์ข‹์€ prompt๋ฅผ ๋„ฃ์œผ๋ฉด ๋˜๋Š”๋ฐ ์ข‹์€ prompt๋ฅผ ์ฐพ๋Š”๊ฒŒ ๊ณต์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ํฌ๊ณ , prompt์— ๋”ฐ๋ผ ์„ฑ๋Šฅ๋„ ๋“ค์ญ‰๋‚ ์ญ‰ํ•˜๋‹ค.
  • idea : prompt๋ฅผ discreteํ•˜๊ฒŒ ์ฐพ์ง€ ๋ง๊ณ  continuous ๊ณต๊ฐ„์—์„œ ์ฐพ์ž
  • architecture : BERT / GPT ๋“ฑ LLM์— template {pseudo-prompt $P_{0:i}$, $\mathbf{x}$, $P_{i+1:m}$, $\mathbf{e(y)}$ }๋ฅผ ๋„ฃ๊ณ  ๊ฐ psudo-prompt์˜ ์ž„๋ฒ ๋”ฉ์„ ํ•™์Šต. ์ด๋•Œ prompt ์ž„๋ฒ ๋”ฉ์ด ์„œ๋กœ ์˜์กด์ ์œผ๋กœ ํ•™์Šต๋์œผ๋ฉด ํ•ด์„œ bi-LSTM ๋ ˆ์ด์–ด๋ฅผ ๋„ฃ์–ด์„œ ์ž„๋ฒ ๋”ฉ ๊ฐ•ํ™”.
  • objective : MLM loss
  • baseline : manual prompt, fiene-tuning, discrete prompt searching, manual prompt + finetuning
  • data : LAMA, SuperGLUE
  • evaluation : accuracy, F1, …
  • result : gpt / bert based model์—์„œ GLUE์˜ ๋Œ€๋ถ€๋ถ„์˜ ํƒœ์Šคํฌ์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ! (finetune๋„ ์ด๊น€)
  • contribution : manualํ•œ prompt search๋ฅผ continuous ์˜์—ญ์œผ๋กœ
  • limitation / things I cannot understand : prompt CIL ์ด๊ฒƒ๋„ ์ข€ ์ƒ๊ฐ๋‚˜๋Š” ๊ฒƒ ๊ฐ™๊ณ .. MTL ํ™˜๊ฒฝ์—์„œ p-tuning ์ ์šฉํ•ด๋ณด๊ณ  ์‹ถ๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“œ๋„น

Details

image image
  • $\mathcal{M}$ : pretrained LM

์ด๋ ‡๊ฒŒ ํ•™์Šต์‹œํ‚ฌ ๋•Œ ๋‘๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š”๋ฐ 1) ์ด๋ฏธ pretrained LM $\mathcal{M}$์˜ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„ $\mathbf{e}$๊ฐ€ discrete ํ•ด์„œ $h$๊ฐ€ random initialize ๋˜๋ฉด small neighborhood ๋“ค์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์ˆ˜์ •๋˜๊ณ  local minima์— ๋น ์ง€๊ธฐ ์‰ฝ๋‹ค๋Š” ๊ฑฐ๊ณ  2) prompt ํ† ํฐ๋“ค๋ผ๋ฆฌ dependent ํ•˜๊ธธ ์›ํ•œ๋‹ค๋Š” ์ ์ด๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด liteํ•œ ๋„คํŠธ์›Œํฌ ํ•˜๋‚˜๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค. image

LSTM์ด ์ถ”๊ฐ€๋˜๊ธด ํ•˜์ง€๋งŒ LM์— ๋น„ํ•˜๋ฉด ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๊ฑฐ์˜ ์—†๊ณ  inference ๋‹จ๊ณ„์—์„œ๋Š” lstm์€ ๊ทธ๋ƒฅ ๋ฒ„๋ฆฌ๊ณ  ํ•™์Šต๋œ ์ž„๋ฒ ๋”ฉ h๋งŒ ์“ฐ๋ฉด ๋œ๋‹ค.

image

Result

image

p-tuning์€ language model์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” freeze finetuning์„ ์ด๊ธฐ๋Š”๊ฒŒ ์‹ ๊ธฐํ•˜๊ตฐ์š”

image

ํ›„์†์—ฐ๊ตฌ

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks ๊ฐ ๋ ˆ์ด์–ด๋งˆ๋‹ค prompt token ๋„ฃ๋Š”๊ฑธ ๊ธฐ์กด p-tuning์—์„œ ์ž˜ ๋ชปํ–ˆ๋˜ hard sequence labeling tasks๋„ ์ž˜ํ•˜๋Š”๊ฑธ ๋ณด์ž„ / ์ž‘์€ ๋ชจ๋ธ์—์„œ๋„ ๋™์ž‘ํ•˜๋Š”๊ฑธ ๋ฐํž˜ https://arxiv.org/pdf/2110.07602.pdf image