srush / do-we-need-attentionLinks
☆166Updated last year
Alternatives and similar repositories for do-we-need-attention
Users that are interested in do-we-need-attention are comparing it to the libraries listed below
Sorting:
- some common Huggingface transformers in maximal update parametrization (µP)☆80Updated 3 years ago
- Understand and test language model architectures on synthetic tasks.☆195Updated 2 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆116Updated 5 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆134Updated last year
- Language models scale reliably with over-training and on downstream tasks