fkodom / grouped-query-attention-pytorchLinks
(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints" (https://arxiv.org/pdf/2305.13245.pdf)
☆176Updated last year
Alternatives and similar repositories for grouped-query-attention-pytorch
Users that are interested in grouped-query-attention-pytorch are comparing it to the libraries listed below
Sorting:
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆325Updated 6 months ago
- Official implementation of TransNormerLLM: A Faster and Better LLM☆247Updated last year
- ☆196Updated last year
- ☆207Updated 10 months ago
- Root Mean Square Layer Normalization☆252Updated 2 years ago
- Implementation of "Attention Is Off By One" by Evan Miller☆196Updated last year
- Low-bit optimizers for PyTorch☆130Updated last year
- TransMLA: Multi-Head Latent Attention Is All You Need☆339Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆170Updated last year
- ☆226Updated last year
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆124Updated last year
- Lion and Adam optimization comparison☆63Updated 2 years ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆105Updated last week
- Implementation of FlashAttention in PyTorch☆162Updated 7 months ago
- AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).☆342Updated 2 years ago
- Rectified Rotary Position Embeddings☆380Updated last year
- Experiments on Multi-Head Latent Attention☆95Updated last year
- Efficient Mixture of Experts for LLM Paper List☆97Updated this week
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆123Updated last year
- Code for paper "Patch-Level Training for Large Language Models"☆86Updated 9 months ago
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆82Updated last year
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆60Updated last year
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆35Updated last year
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆99Updated last year
- ☆106Updated last year
- qwen-nsa☆74Updated 4 months ago
- ☆139Updated last year
- ☆270Updated last year
- Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…☆117Updated last week
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Updated last year