This is a simple torch implementation of the high performance Multi-Query Attention
☆16Aug 23, 2023Updated 2 years ago
Alternatives and similar repositories for MultiQueryAttention
Users that are interested in MultiQueryAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Implementation for NorMuon paper☆70Apr 30, 2026Updated 3 weeks ago
- ☆16Sep 17, 2024Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Jul 17, 2024Updated last year
- The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…☆16Dec 11, 2023Updated 2 years ago
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆43Mar 31, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated last year
- This repository is the official implementation of Bidirectional Learning for Offline Infinite-width Model-based Optimization (NeurIPS 202…☆14Jan 19, 2023Updated 3 years ago
- Spatial Spectral Machine Learning☆14Oct 15, 2025Updated 7 months ago
- ☆20Oct 25, 2022Updated 3 years ago
- ☆26May 24, 2023Updated 2 years ago
- Calculating FLOPs of Pre-trained Models in NLP☆18Mar 29, 2021Updated 5 years ago
- [ACMMM 2022] ReCoRo: Region-Controllable Robust Light Enhancement by User-Specified Imprecise Masks☆15Feb 6, 2023Updated 3 years ago
- ☆39May 20, 2025Updated last year
- Parallel Self-Adjusting Computation☆16Jul 5, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆13Mar 13, 2023Updated 3 years ago
- PyTorch Implementation: Code for the paper "Generalizing to Unseen Domains via Adversarial Data Augmentation", NeurIPS 2018. Origin Tenso…☆14Sep 17, 2020Updated 5 years ago
- Official Code for Guided Trajectory Generation with Diffusion Models for Offline Model-based Optimization (NIPS 2024)☆23Aug 15, 2024Updated last year
- Experiments on Multi-Head Latent Attention☆101Aug 19, 2024Updated last year
- Model implementation for the contextual embeddings project☆47Jun 2, 2025Updated 11 months ago
- ☆12Jul 25, 2020Updated 5 years ago
- A swarm of LLM agents that will help you test, document, and productionize your code!☆19May 11, 2026Updated last week
- 📰 Must-read papers on Diffusion Models for Text Generation 🔥☆19Jun 21, 2024Updated last year
- Website for CSE 234, Winter 2025☆15Mar 24, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- ☆30May 24, 2025Updated 11 months ago
- Smoothing video traffic to make it a friendlier internet neighbor☆14Apr 23, 2024Updated 2 years ago
- Repository for the DPP'23 course☆11May 2, 2024Updated 2 years ago
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆32May 28, 2025Updated 11 months ago
- Generic library for neural collapse and several derivative works on the phenomenon.☆18Apr 14, 2025Updated last year
- 学习的A星算法教程,把代码分享给更多人。一起学习。☆16Apr 5, 2018Updated 8 years ago
- ☆13May 11, 2023Updated 3 years ago
- This repository provides a framework to serve LLM(Large Language Model) based applications such as Chatbot.☆18Apr 20, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The official implementation of the paper "Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models" (NeurIPS 2025 Pos…☆73Sep 29, 2025Updated 7 months ago
- The codes of our paper "ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion"☆14Jun 29, 2025Updated 10 months ago
- 삼각형의 실전! Triton☆16Feb 15, 2024Updated 2 years ago
- ☆11Sep 20, 2024Updated last year
- Vietnamese GPT-J API service deployed with Docker & Helm chart☆10Dec 11, 2022Updated 3 years ago
- A very simple performing matrix multiplication example for CPU / CUDA / METAL using GGML / llama.cpp☆13Jul 7, 2024Updated last year
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year