This is a simple torch implementation of the high performance Multi-Query Attention
☆16Aug 23, 2023Updated 2 years ago
Alternatives and similar repositories for MultiQueryAttention
Users that are interested in MultiQueryAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19Feb 2, 2026Updated 2 months ago
- Official Implementation for NorMuon paper☆65Mar 11, 2026Updated last month
- ☆16Sep 17, 2024Updated last year
- ☆23Mar 7, 2025Updated last year
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆42Mar 31, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆30Dec 23, 2025Updated 3 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated 11 months ago
- Spatial Spectral Machine Learning☆14Oct 15, 2025Updated 5 months ago
- ☆20Oct 25, 2022Updated 3 years ago
- ☆26May 24, 2023Updated 2 years ago
- Benchmark for Biophysical Sequence Optimization Algorithms☆21May 21, 2025Updated 10 months ago
- ☆39May 20, 2025Updated 10 months ago
- Parallel Self-Adjusting Computation☆16Jul 5, 2021Updated 4 years ago
- ☆13Mar 13, 2023Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆19Jun 13, 2024Updated last year
- PyTorch Implementation: Code for the paper "Generalizing to Unseen Domains via Adversarial Data Augmentation", NeurIPS 2018. Origin Tenso…☆14Sep 17, 2020Updated 5 years ago
- Experiments on Multi-Head Latent Attention☆101Aug 19, 2024Updated last year
- ☆12Jul 25, 2020Updated 5 years ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆39Jun 23, 2025Updated 9 months ago
- Implementation of the paper 'Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance' (EMNLP 2025)☆27Dec 16, 2025Updated 3 months ago
- ☆24Feb 16, 2022Updated 4 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- Smoothing video traffic to make it a friendlier internet neighbor☆14Apr 23, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Repository for the DPP'23 course☆11May 2, 2024Updated last year
- A swarm of LLM agents that will help you test, document, and productionize your code!☆16Mar 30, 2026Updated last week
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆32May 28, 2025Updated 10 months ago
- Generic library for neural collapse and several derivative works on the phenomenon.☆18Apr 14, 2025Updated 11 months ago
- ☆45Jun 7, 2024Updated last year
- This repository is the official implementation of Generalized Data Weighting via Class-level Gradient Manipulation (NeurIPS 2021)(http://…☆22Oct 8, 2022Updated 3 years ago
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆11Jun 18, 2024Updated last year
- This repository provides a framework to serve LLM(Large Language Model) based applications such as Chatbot.☆18Apr 20, 2023Updated 2 years ago
- The official implementation of the paper "Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models" (NeurIPS 2025 Pos…☆70Sep 29, 2025Updated 6 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆47Feb 13, 2025Updated last year
- The codes of our paper "ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion"☆14Jun 29, 2025Updated 9 months ago
- ☆11Sep 20, 2024Updated last year
- Vietnamese GPT-J API service deployed with Docker & Helm chart☆10Dec 11, 2022Updated 3 years ago
- Code for EMNLP2020 paper: "Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space"☆26May 10, 2021Updated 4 years ago
- A very simple performing matrix multiplication example for CPU / CUDA / METAL using GGML / llama.cpp☆13Jul 7, 2024Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆11Sep 23, 2024Updated last year