This is a simple torch implementation of the high performance Multi-Query Attention
☆16Aug 23, 2023Updated 2 years ago
Alternatives and similar repositories for MultiQueryAttention
Users that are interested in MultiQueryAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16May 18, 2026Updated 3 weeks ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Jul 17, 2024Updated last year
- [TMLR 2025 & ICLR 2025 DeLTa] Official Implementation of Design Editing for Offline Model-based Optimization 🧬 🤖☆10Apr 17, 2025Updated last year
- ☆24Mar 7, 2025Updated last year
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆44Mar 31, 2025Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated last year
- ☆13Jul 20, 2021Updated 4 years ago
- Spatial Spectral Machine Learning☆14Oct 15, 2025Updated 7 months ago
- ☆26May 24, 2023Updated 3 years ago
- ☆20Oct 25, 2022Updated 3 years ago
- [ACMMM 2022] ReCoRo: Region-Controllable Robust Light Enhancement by User-Specified Imprecise Masks☆15Feb 6, 2023Updated 3 years ago
- ☆39May 20, 2025Updated last year
- PyTorch Implementation: Code for the paper "Generalizing to Unseen Domains via Adversarial Data Augmentation", NeurIPS 2018. Origin Tenso…☆14Sep 17, 2020Updated 5 years ago
- Official Code for Guided Trajectory Generation with Diffusion Models for Offline Model-based Optimization (NIPS 2024)☆23Aug 15, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Model implementation for the contextual embeddings project☆47Jun 2, 2025Updated last year
- ☆12Aug 28, 2025Updated 9 months ago
- 医疗智能问答系统☆17Feb 22, 2018Updated 8 years ago
- 📰 Must-read papers on Diffusion Models for Text Generation 🔥☆19Jun 21, 2024Updated last year
- Porting Postgres Server to WASM [WIP]☆16Mar 6, 2021Updated 5 years ago
- [ICLR 2025] Official Implementation of ParetoFlow: Guided Flows in Multi-Objective Optimization🧬🧬🧬☆30Mar 3, 2025Updated last year
- Smoothing video traffic to make it a friendlier internet neighbor☆14Apr 23, 2024Updated 2 years ago
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆32May 28, 2025Updated last year
- Generic library for neural collapse and several derivative works on the phenomenon.☆18Apr 14, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This repository is the official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021)(http://…☆22Oct 8, 2022Updated 3 years ago
- ☆13May 11, 2023Updated 3 years ago
- Implementation code for the paper "Meta-learning via Language Model In-context Tuning" (ACL 2022)☆25Jun 16, 2022Updated 3 years ago
- The official implementation of the paper "Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models" (NeurIPS 2025 Pos…☆74Sep 29, 2025Updated 8 months ago
- The codes of our paper "ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion"☆14Jun 29, 2025Updated 11 months ago
- 天池疫情公益文本相似对比大赛☆20Apr 7, 2020Updated 6 years ago
- Implementation of the paper 'Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance' (EMNLP 2025)☆30Dec 16, 2025Updated 5 months ago
- ☆11Sep 20, 2024Updated last year
- Vietnamese GPT-J API service deployed with Docker & Helm chart☆10Dec 11, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for EMNLP2020 paper: "Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space"☆26May 10, 2021Updated 5 years ago
- Hal Daume's hbc☆20Jan 23, 2010Updated 16 years ago
- The official evaluation suite and dynamic data release for MixEval.☆11Sep 23, 2024Updated last year
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆50May 12, 2026Updated 3 weeks ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year
- ☆12May 20, 2025Updated last year
- An innovative method designed to augment the capabilities of existing video diffusion models☆22May 10, 2024Updated 2 years ago