[54] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

2022λ…„ 8μ›” 25일 Β· 2 λΆ„ Β· long8v Β· 

[16] Counterfactual Memorization in Neural Language Models

2022λ…„ 3μ›” 25일 Β· 3 λΆ„ Β· long8v Β· 

[15] Quantifying Memorization Across Neural Language Models

2022λ…„ 3μ›” 24일 Β· 3 λΆ„ Β· long8v Β·