This is a simple torch implementation of the high performance Multi-Query Attention
☆16Aug 23, 2023Updated 2 years ago
Alternatives and similar repositories for MultiQueryAttention
Users that are interested in MultiQueryAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Jul 17, 2024Updated last year
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated last year
- ☆26May 24, 2023Updated 3 years ago
- ☆20Oct 25, 2022Updated 3 years ago
- [ACMMM 2022] ReCoRo: Region-Controllable Robust Light Enhancement by User-Specified Imprecise Masks☆15Feb 6, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆39May 20, 2025Updated last year
- ☆13Mar 13, 2023Updated 3 years ago
- PyTorch Implementation: Code for the paper "Generalizing to Unseen Domains via Adversarial Data Augmentation", NeurIPS 2018. Origin Tenso…☆14Sep 17, 2020Updated 5 years ago
- Experiments on Multi-Head Latent Attention☆101Aug 19, 2024Updated last year
- ☆12Jul 25, 2020Updated 5 years ago
- Porting Postgres Server to WASM [WIP]☆16Mar 6, 2021Updated 5 years ago
- Website for CSE 234, Winter 2025☆16Mar 24, 2025Updated last year
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- ☆30May 24, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Smoothing video traffic to make it a friendlier internet neighbor☆14Apr 23, 2024Updated 2 years ago
- Repository for the DPP'23 course☆11May 2, 2024Updated 2 years ago
- Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"☆32May 28, 2025Updated last year
- ☆45Jun 7, 2024Updated 2 years ago
- Generic library for neural collapse and several derivative works on the phenomenon.☆18Apr 14, 2025Updated last year
- POSTECH: Compiler Construction (Spring 2022)☆11Mar 10, 2023Updated 3 years ago
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆12Jun 18, 2024Updated 2 years ago
- This repository provides a framework to serve LLM(Large Language Model) based applications such as Chatbot.☆18Apr 20, 2023Updated 3 years ago
- 삼각형의 실전! Triton☆16Feb 15, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆11Sep 20, 2024Updated last year
- Hal Daume's hbc☆20Jan 23, 2010Updated 16 years ago
- A very simple performing matrix multiplication example for CPU / CUDA / METAL using GGML / llama.cpp☆13Jul 7, 2024Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆11Sep 23, 2024Updated last year
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆50May 12, 2026Updated last month
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year
- Official PyTorch implementation of CD-MOE☆12Mar 18, 2026Updated 3 months ago
- Mamba support for transformer lens☆20Sep 17, 2024Updated last year
- An innovative method designed to augment the capabilities of existing video diffusion models☆22May 10, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆19Jun 23, 2026Updated last week
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation (ICCV'25)☆55Oct 6, 2025Updated 8 months ago
- Implementation of the model from "Faster sorting algorithms discovered using deep reinforcement learning" that discovered an all-new ult…☆11Aug 29, 2023Updated 2 years ago
- [CVPR 2025] Multi-focal Conditioned Latent Diffusion for Person Image Synthesis☆23Mar 23, 2025Updated last year
- ☆93Aug 18, 2024Updated last year
- Tools to manipulate and extract data from wikipedia dumps☆47May 21, 2013Updated 13 years ago