kyegomez / MultiQueryAttention
This is a simple torch implementation of the high performance Multi-Query Attention
☆15Updated last year
Related projects ⓘ
Alternatives and complementary repositories for MultiQueryAttention
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆35Updated 11 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆93Updated last month
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆52Updated last week
- HGRN2: Gated Linear RNNs with State Expansion☆49Updated 3 months ago
- ☆35Updated 9 months ago
- A repository for research on medium sized language models.☆74Updated 6 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆24Updated last week
- GoldFinch and other hybrid transformer components☆40Updated 4 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated this week
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆14Updated 8 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆61Updated 7 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆33Updated last month
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆28Updated 8 months ago
- ☆22Updated 2 weeks ago
- ☆62Updated 3 months ago
- ☆17Updated 2 years ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆41Updated 10 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆46Updated last month
- Linear Attention Sequence Parallelism (LASP)☆64Updated 5 months ago
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆31Updated 11 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆66Updated last year
- Scaling Sparse Fine-Tuning to Large Language Models☆17Updated 9 months ago
- ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802☆88Updated last year
- Implementation of Agent Attention in Pytorch☆86Updated 4 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆43Updated 4 months ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆43Updated last month
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'☆41Updated 3 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆43Updated last year