image problem : LM์œผ๋กœ few-shotํ•˜์ž. solution : ์—„์ฒญ ํฐ LM ๋ชจ๋ธ์„ ๋งŒ๋“ค์ž result : ๋‹ค์–‘ํ•œ NLP task์—์„œ few-shot ์„ฑ๋Šฅ SOTA. details :

  • ๋ชจ๋ธ ํฌ๊ธฐ ๋ณ„ zero-, one-, few-shot ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ๋น„๊ต. ๋ชจ๋ธ์ด ์ปค์งˆ์ˆ˜๋ก in-context learning์ด ํšจ๊ณผ์ ์ž„ image

  • GPT3์—์„œ ์šฉ์–ด ์„ค๋ช… image

  • ๋ชจ๋ธ ์•„ํ‚คํ…์ณ๋Š” GPT2์™€ ๋งค์šฐ ์œ ์‚ฌํ•˜๋‚˜, Sparse Transformer ๊ฐ™์€ locally banded sparseํ•œ ์–ดํ…์…˜์œผ๋กœ ๋ฐ”๊พธ์—ˆ๋‹ค.

  • ๋ชจ๋ธ ํฌ๊ธฐ๋Š” ์ด ์ •๋„. “GPT-3"๋ผ๊ณ  ๋ณดํ†ต ๋ถ€๋ฅด๋Š” ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” 1750์–ต. ๋ฐ์ดํ„ฐ๋Š” 3000์–ต ํ† ํฐ. image

  • ๋ฐ์ดํ„ฐ๋Š” Common Crawl ์„ ์‚ฌ์šฉํ–ˆ๊ณ , ๋ฐ์ดํ„ฐ์˜ ์งˆ์„ ์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•ด ์ „์ฒ˜๋ฆฌ๋„ ํ•˜๊ณ , ์•Œ๋ ค์ง„ ๋†’์€ ํ€„๋ฆฌํ‹ฐ์˜ corpus์™€ ์„ž๊ธฐ๋„ ํ•˜์˜€๋‹ค.

  • ํฐ ๋ชจ๋ธ์€ batch size๋ฅผ ์ตœ๋Œ€ํ•œ ํฌ๊ฒŒ, ์ž‘์€ learning rate๋ฅผ ๊ฐ€์ง€๋„๋ก ํ•˜๋Š”๊ฒƒ์ด ์ข‹๋‹ค.

  • gradient noise scale์„ ๊ตฌํ•œ ๋‹ค์Œ ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ batch size๋ฅผ ์ •ํ•˜์˜€๋‹ค.(ref )

  • Downstream Tasks :

    • Penn Tree Bank : ๊ตฌ๋ฌธ๋ถ„์„์„ ์œ„ํ•œ corpus์ธ๋ฐ LM ์„ฑ๋Šฅ ํ‰๊ฐ€๋กœ๋„ ํ•˜๋Š”๋“ฏ

    • LAMBADA : context ์ฃผ๊ณ  ๋นˆ์นธ ์ถ”๋ก  corpus. long-range depndencies๋ฅผ ์ž˜ ํ•ด๊ฒฐํ•ด์•ผ ํ•จ

    • SuperGLUE : ์ด๊ฒƒ์ €๊ฒƒ ์–ด๋ ค์šด NLP task ๋ชจ์•„ ๋†“์€ ๊ฒƒ
      image

    • ์‚ฐ์ˆ˜ : 2~5์ž๋ฆฌ์ˆ˜ ๋”ํ•˜๊ธฐ/๋นผ๊ธฐ, 2์ž๋ฆฌ์ˆ˜ ๊ณฑํ•˜๊ธฐ, 1์ž๋ฆฌ์ˆ˜ ์—ฐ์‚ฐ( 6+(4*8) ๊ฐ™์€ ๊ฒƒ)

    • word scrambling and manipulation task image

    • news article generation : ์ธ๊ฐ„์ด ์ง์ ‘ ์“ด ๋‰ด์Šค์™€ ๋ชจ๋ธ์ด ๋งŒ๋“  ๋‰ด์Šค ๊ตฌ๋ถ„ํ•˜๋Š” annotation ์ง„ํ–‰. ์ผ๋ถ€๋Ÿฌ ๊ตฌ๋ฆฐ ๋ชจ๋ธ์ด๋ž‘ ๋น„๊ตํ•ด์„œ t-test.

    • learning and using novel words : ๋”ฑ ํ•œ๋ฒˆ๋งŒ ์“ฐ์ธ ๋‹จ์–ด๋ฅผ ๋ณด๊ณ  ๊ทธ ๋‹จ์–ด๋ฅผ ๋„ฃ์€ ๋ฌธ์žฅ์„ ๋งŒ๋“ค๋ผ๊ณ  ํ•จ. image

  • correcting english grammar : "Poor English Input: <sentence>\n Good English Output: <sentence> ์ด๋ ‡๊ฒŒ input์„ ์คŒ. image

  • GPT3 ๋ชจ๋ธ์˜ ํ•œ๊ณ„๋“ค

    • ์ƒ์„ฑ์„ ์ž˜ ๋ชปํ•จ. ๋‹จ์–ด๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ฑ‰์–ด๋ƒ„.
    • ๋ฌผ๋ฆฌํ•™์— ๋Œ€ํ•œ common sense๊ฐ€ ๋ถ€์กฑํ•จ. ๊ฐ€๋ น, ‘์น˜์ฆˆ๋ฅผ ๋ƒ‰์žฅ๊ณ ์— ๋„ฃ์œผ๋ฉด ๋…น์„๊นŒ?‘์™€ ๊ฐ™์€ ๊ฒƒ์— ๋Œ€๋‹ต์„ ์ž˜ ๋ชปํ•จ.
    • LM obejctive๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์—, bi-LM์ด ์•„๋‹ˆ๊ณ , ์–ด๋–ค ๋‹จ์–ด๊ฐ€ ์ค‘์š”ํ•˜๊ณ  ๊ทธ๋ ‡์ง€ ์•Š์€์ง€์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ๋ถ€์กฑํ•จ.
    • ๋‹ค๋ฅธ ๋„๋ฉ”์ธ ๊ฐ€๋ น ๋น„๋””์˜ค๋‚˜ ์‚ฌ์ง„์— ๋Œ€ํ•œ ๊ฒƒ์„ ํ•™์Šตํ•œ์ ์ด ์—†์–ด์„œ ์‹ค์ œ ์„ธ์ƒ์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ๋ถ€์กฑํ•จ
    • ์ธ๊ฐ„์ด ํ‰์ƒ๋™์•ˆ ๋ณผ ๋‹จ์–ด๋“ค์„ ๋‹ค ๋ณธ ๊ฒƒ ๊ฐ™์€๋ฐ ์ธ๊ฐ„๋ณด๋‹ค ํ•™์Šต์†๋„๊ฐ€ ๋–จ์–ด์ง