kyegomez / MGQA
The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"
☆14Updated last year
Alternatives and similar repositories for MGQA:
Users that are interested in MGQA are comparing it to the libraries listed below
- A simple reproducible template to implement AI research papers☆23Updated 6 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆47Updated last year
- A repository for DenseSSMs☆87Updated 11 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆34Updated 9 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 6 months ago
- ☆102Updated last year
- ☆52Updated 8 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆53Updated 7 months ago
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 5 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated 2 years ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆21Updated 2 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆53Updated 7 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 5 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated last year
- The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Imag…☆12Updated last year
- Structured Pruning Adapters in PyTorch☆16Updated last year
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆39Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 9 months ago
- ☆16Updated 5 months ago
- The official repo of continuous speculative decoding☆25Updated this week
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 9 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆32Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 5 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated 2 weeks ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆25Updated 8 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆86Updated 3 weeks ago
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆27Updated 2 months ago
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML'24)☆29Updated 7 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆24Updated 4 months ago