image

paper

TL;DR

  • I read this because.. : q*์˜ star๊ฐ€ ์ด๊ฑฐ๋‹ค ๋“ฑ๋“ฑ ๋งŽ์ด ์–ธ๊ธ‰๋˜์–ด
  • task : problem solving
  • problem : rationale์„ ํ•™์Šตํ•˜๋ฉด ๋ชจ๋ธ ์„ฑ๋Šฅ์ด ๋” ์ข‹์ง€ ์•Š์„๊นŒ?
  • idea : ํœด๋ฆฌ์Šคํ‹ฑ์œผ๋กœ๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์œผ๋‹ˆ ๋ชจ๋ธ์—๊ฒŒ rationale์„ ์ƒ์„ฑํ•˜๊ฒŒ ํ•˜์ž. ๋ชป ์ƒ์„ฑํ•˜๋ฉด ์ •๋‹ต์„ hint๋กœ ์ฃผ์ž.
  • input/output : Q -> rationale - A
  • architecture : GPT-J
  • objective : CE loss
  • baseline : direct answer tuned GPT-J, Few-shot GPT-J, Few-shot LaMDA 137B
  • data : (source) GSM, CommonsenceQA, arithmetic problem
  • evaluation : accuracy
  • result : ๋” ๋น ๋ฅด๊ฒŒ ์ •ํ™•๋„๊ฐ€ ์˜ฌ๋ผ๊ฐ. ๋ชป ํ’€๋˜ ๋ฌธ์ œ๋„ ํ’ˆ(์ตœ์ข… ์ •ํ™•๋„๊ฐ€ ์˜ฌ๋ผ๊ฐ).
  • contribution : self-improvement? self-evolvement? rationale ๊ฐ•์กฐ?
  • etc. :

Details

STaR

image image

๋””ํ…Œ์ผ์€ 1) ์ •๋‹ต์„ ๋งž์ถ”์ง€ ์•Š์€ ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ๋งŒ hint๋ฅผ ์คŒ 2) model finetune์„ ํ•  ๋•Œ iterativeํ•˜๊ฒŒ ํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ base model์—์„œ ํ–ˆ๋‹ค๊ณ  ํ•จ. ์Œ ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด์„œ ์ ์  rationale์ด ์ข‹์•„์ง€๋Š”๊ฑด๊ฐ€? ์ด๊ฑด ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์ด๋ž‘ ๋ฐฉ์‹์ด ์ข€ ๋‹ค๋ฅธ๋“ฏ..

์ •๋‹ต์ด ํ‹€๋ฆฐ rationale์— ๋Œ€ํ•ด์„œ filteringํ•˜๋Š” ํ”„๋กœ์„ธ์Šค๊ฐ€ RL objectvie๋ž‘ ๋น„์Šทํ•˜๋‹ค๊ณ  ์ฃผ์žฅ

image

Result

image

color๋Š” ๋ช‡์ž๋ฆฌ digit problem์ธ์ง€

image

๋ชป๋ณธ digit์— ๋Œ€ํ•ด๋„ ํ’€์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์ด ๋ฐœํ˜„

image image