[22] Transformers without Tears: Improving the Normalization of Self-Attention

2022λ…„ 4μ›” 21일 Β· 2 λΆ„ Β· long8v Β· 

[21] cosFormer: Rethinking Softmax in Attention

2022λ…„ 4μ›” 20일 Β· 2 λΆ„ Β· long8v Β· 

[20] Memorizing Transformer

2022λ…„ 4μ›” 7일 Β· 3 λΆ„ Β· long8v Β· 

[16] Counterfactual Memorization in Neural Language Models

2022λ…„ 3μ›” 25일 Β· 3 λΆ„ Β· long8v Β· 

[15] Quantifying Memorization Across Neural Language Models

2022λ…„ 3μ›” 24일 Β· 3 λΆ„ Β· long8v Β· 

[14] Longformer: The Long-Document Transformer

2022λ…„ 2μ›” 22일 Β· 2 λΆ„ Β· long8v Β· 

[12] BBPE: Neural Machine Translation with Byte-Level Subwords

2022λ…„ 2μ›” 18일 Β· 2 λΆ„ Β· long8v Β·