image

paper

TL;DR

  • I read this because.. : efficient finetuning series water
  • task : LLM finetuning
  • problem : finetuning is inefficient. Finding discrete prompts is inefficient.
  • idea : precede a continuous prompt.
  • architecture : BART, GPT-2
  • objective : ce loss
  • baseline : finetuning, finetuning top 2 layer, apdapter
  • data : E2E, WebNLG, DART
  • result : slightly lower than finetuning, slightly better than adapter or ft-top2
  • contribution : An idea similar to #113

Details

image

PLM separate and with a hidden dimension matrix $P_\theta$ for prefixes image

image

We found that starting with a smaller matrix $P_\theta ‘$ and increasing the size with MLP performed better. After learning, we can use the prefix $P_\theta $ directly without $P_\theta ‘$.

Results

image

Ablations

  • It was better to init with real words than random initalize in low data situations. image

Even task-irrelevant “elephant” was better than random. When full, Initialize was not significantly affected.

  • Prompt length had an upward performance curve per task 200 for summary / 10 for table to text image

  • The prefix form, preceded by prompt, performed better than the infix form, $[x; prompt; y]$. image