[195] STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning

TL;DR

I read this because.. : q*’s star is this and so on and so forth.
task : problem solving
PROBLEM : Wouldn’t the model perform better if we learned the rationale?
idea : Let the model generate a rationale, since heuristics can only go so far. If it can’t, hint at the correct answer.
input/output : Q -> rationale - A
architecture : GPT-J
objective : CE loss
baseline : direct answer tuned GPT-J, Few-shot GPT-J, Few-shot LaMDA 137B
data : (source) GSM, CommonsenceQA, arithmetic problem
evaluation : accuracy
Result :** Accuracy improves faster. Solve problems you couldn’t solve (final accuracy increases).
contribution : self-improvement? self-evolution? emphasize rationale?
etc. :

Details

STaR

The details are 1) hinting only for questions that are not answered correctly 2) model finetuning is done in the base model, not iteratively. Is the rationale getting better and better as we go along? It seems like this is a little different from other models…

Claim that the process of filtering for incorrect rationales is similar to RL objectvie

Result

color is the number of digits problem

Ability to solve for digits you’ve never seen before

TL;DR#

Details#

STaR#

Result#

TL;DR

Details

STaR

Result