TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
☆1,536Mar 27, 2026Updated 2 months ago
Alternatives and similar repositories for turboquant
Users that are interested in turboquant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44…☆1,010Apr 23, 2026Updated last month
- ☆169Mar 30, 2026Updated 2 months ago
- Rust library implementing the Toorani-Beheshti signcryption scheme☆13Aug 15, 2023Updated 2 years ago
- LLM KV Cache compression - K+V dual compression, 73-99% VRAM savings, zero accuracy loss☆57Mar 30, 2026Updated 2 months ago
- 基于Go实现信令服务器,基于webrtc-m96实现webrtc推拉流服务器和PC端推拉流SDK☆14May 13, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"☆165May 15, 2026Updated 3 weeks ago
- A simple GPT-3 interface to automate core legal writing tasks☆13Mar 8, 2023Updated 3 years ago
- Official repository Flash Local Linear Attention☆36May 28, 2026Updated last week
- Community maintained mobile app for use with Lemonade. Join our discord: https://discord.gg/5xXzkMu8Zk☆31Jun 1, 2026Updated last week
- MLX Implementation of Recursive Reasoning with Tiny Networks☆78Oct 11, 2025Updated 7 months ago
- SVG Analysis and generation tools for commonly seen SVG attachment phishing☆57Sep 24, 2025Updated 8 months ago
- Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tu…☆97Mar 6, 2026Updated 3 months ago
- ToRoLaMa: The Vietnamese Instruction-Following and Chat Model☆24Jan 4, 2024Updated 2 years ago
- update code for pytorch1.4☆11Aug 12, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The official implementation of the ICML 2023 paper OFQ-ViT☆39Oct 3, 2023Updated 2 years ago
- Talk to your shell in natural language. Locally.☆54Feb 15, 2026Updated 3 months ago
- Universal MCP server installer - install any MCP server to any AI agent with one command☆20Feb 14, 2026Updated 3 months ago
- (CVPR 2026) PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction☆59May 31, 2026Updated last week
- ☆13Sep 25, 2023Updated 2 years ago
- Responsive images using Imager.js and Polymer☆73Dec 14, 2015Updated 10 years ago
- A question answering AI tool for the content from the PDF files of the Civil Code, Criminal Code, Code of Criminal Procedure, Labor Stand…☆12May 14, 2023Updated 3 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 4 months ago
- A WebAssembly eBPF runtime based on wasmtime in rust☆11Feb 20, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Compression schema for gradients of activations in backward pass☆45Jul 26, 2023Updated 2 years ago
- Data and Code for COLM 2025 Paper "MSRS: Evaluating Multi-Source Retrieval-Augmented Generation"☆32Aug 29, 2025Updated 9 months ago
- Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚☆22Jul 14, 2025Updated 10 months ago
- LEMMA: Logical Engine for Multi-domain Mathematical Analysis☆28Feb 14, 2026Updated 3 months ago
- An Tensorflow.keras implementation of Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorizatio…☆10Dec 18, 2019Updated 6 years ago
- Production-grade template for Pi-powered Chrome extensions☆51May 16, 2026Updated 3 weeks ago
- pytorch+bert实现的意图识别与槽位填充☆11May 30, 2023Updated 3 years ago
- PyTorch Implementation of Image Generation with a Sphere Encoder☆44May 20, 2026Updated 3 weeks ago
- Bug Bounty Monitor☆15Nov 23, 2020Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Scripts to check kernel CONFIG_ values used by Mer☆10May 29, 2026Updated last week
- A portable, embeddable implementation of the BASIC programming language.☆16Feb 21, 2013Updated 13 years ago
- Book Quick Starter Kit - Write Your Own Book in Plain Text (with Markdown)☆20Jul 4, 2016Updated 9 years ago
- A PyTorch implementation of a conditional Denoising Diffusion Probabilistic Model (DDPM) for multi-modal trajectory prediction. This proj…☆39Feb 20, 2026Updated 3 months ago
- Chatbot_CN项目的知识图谱模块☆12Mar 27, 2020Updated 6 years ago
- [NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference☆43Oct 29, 2025Updated 7 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆127Oct 15, 2025Updated 7 months ago