AmesianX/TurboQuant

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AmesianX/TurboQuant)

AmesianX / TurboQuant

TurboQuant KV Cache Compression for llama.cpp — 5.2x memory reduction with near-lossless quality | Implementation of Google DeepMind's TurboQuant (ICLR 2026)

☆91

Alternatives and similar repositories for TurboQuant

Users that are interested in TurboQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dhawalc / turboQuantDC
View on GitHub
☆39Apr 27, 2026Updated 2 months ago
eullm / eullm
View on GitHub
Open-source platform for creating, distributing and running sovereign EU-compliant LLMs. Verticalize any model for your domain, language …
☆50Updated this week
aivrar / multi-turboquant
View on GitHub
Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU pl…
☆24Jul 11, 2026Updated last week
Etherll / Timbre
View on GitHub
Extract a target speaker’s clean, non-overlapped speech from multi-speaker audio and export word-safe LJSpeech-style TTS datasets.
☆21Jun 14, 2026Updated last month
LessUp / meta-human
View on GitHub
Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready AI avatar platform. | 浏览器原生 3D 数字人引擎，支持语…
☆24Jul 12, 2026Updated last week
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
Madreag / turbo3-cuda
View on GitHub
LLM inference in C/C++
☆36Apr 12, 2026Updated 3 months ago
aleko2144 / KoTR_Modern_Patch
View on GitHub
King of the Road (Дальнобойщики 2) Modern Patch sources
☆18Jul 6, 2026Updated 2 weeks ago
TheTom / llama-cpp-turboquant
View on GitHub
LLM inference in C/C++
☆2,152Updated this week
GVDub / panai-seed-node
View on GitHub
“A locally hosted, memory-aware AI microservice—designed for cultural continuity, decentralized intelligence, and ethical autonomy.”
☆27May 1, 2025Updated last year
wazionapps / nexo
View on GitHub
NEXO runtime core for NEXO Desktop: local memory, automation, MCP tools and update-managed runtime.
☆27Jul 3, 2026Updated 2 weeks ago
desh2608 / kaldi-noise-vectors
View on GitHub
Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.
☆13Feb 13, 2021Updated 5 years ago
locaith / bio-memory-ai-locaith
View on GitHub
🧠 Bio-Agent OS: 🇻🇳 Bio-Inspired Memory Framework for AI Agents (OpenClaw/ERP). Researched & Developed by Dev Tuan Anh Ha (Locaith Solu…
☆20Apr 21, 2026Updated 3 months ago
AjeetGitHub2016 / deeplearning.ai
View on GitHub
deeplearning.ai is the complete course on Deep Learning on Coursera. The instructor of this course is Andrew Ng. Programming assignments…
☆12Jul 6, 2018Updated 8 years ago
youngharold / tightwad
View on GitHub
Mixed-vendor GPU inference cluster manager with speculative decoding
☆28Jul 2, 2026Updated 2 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Cioscos / DFL-Colab
View on GitHub
☆17Feb 18, 2024Updated 2 years ago
davccavalcante / claude-code-leaked
View on GitHub
Anthropic Claude Code CLI — Official CLI/TUI coding agent, rebuilt from a leaked source map v2.1.88 (March 2026).
☆33Mar 31, 2026Updated 3 months ago
Pixedar / TraceScope
View on GitHub
Embed, cluster, and visualize any collection of texts in 3D semantic space — then learn a continuous semantic flow field over that space,…
☆18May 9, 2026Updated 2 months ago
MerlinStacks / overseek
View on GitHub
OverSeek is your open-source command center—a self-hosted, privacy-first platform that unifies analytics, automation, inventory, and cust…
☆26Updated this week
adelacvg / diff-vits
View on GitHub
☆39Oct 1, 2023Updated 2 years ago
kdrkdrkdr / GoodbyeLaver
View on GitHub
Decensoring Hentai
☆13Sep 19, 2022Updated 3 years ago
srijanshukla18 / xray
View on GitHub
XRAY MCP provides progressive code intelligence and navigation capabilities for AI assistants through structural code analysis using as…
☆51Dec 11, 2025Updated 7 months ago
Mijalski / DynamicCorsPolicy
View on GitHub
CORS Policy with dynamic resolver allowing origins configured at the startup as well as others based on the implementation of the method.…
☆12Nov 9, 2020Updated 5 years ago
PasiKoodaa / ACE-Step-RADIO
View on GitHub
ACE-Step: A Step Towards Music Generation Foundation Model
☆50May 20, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
caiovicentino / polarengine-vllm
View on GitHub
PolarEngine: vLLM plugin for PolarQuant quantized LLM inference — 75% FP16 speed at 2.3x less VRAM
☆34Apr 13, 2026Updated 3 months ago
latenceainew / colsearch
View on GitHub
High-performance late-interaction retrieval engine for on-prem AI. ColBERT/ColPali multi-vector search with Rust fused MaxSim, Triton GPU…
☆17Jul 6, 2026Updated 2 weeks ago
reppy4620 / x-vits
View on GitHub
☆14Aug 1, 2025Updated 11 months ago
dceluis / ln-diff
View on GitHub
Line-numbered patch format. Non-sequential, llm and stream-friendly
☆15Nov 7, 2024Updated last year
RecursiveIntell / turbo-quant
View on GitHub
Rust implementation of TurboQuant, PolarQuant, and QJL — zero-overhead vector quantization for semantic search and KV cache compression (…
☆29May 31, 2026Updated last month
Toowiredd / claude-skills-automation
View on GitHub
Fully automated memory and context management for Claude Code using hooks - Zero friction, zero context loss
☆32Oct 22, 2025Updated 8 months ago
TheTom / turboquant_plus
View on GitHub
☆6,997Jun 26, 2026Updated 3 weeks ago
BoFan-tunning / llama.cpp-MTP-TurboQuant
View on GitHub
☆142Jun 13, 2026Updated last month
shenald-dev / one-api
View on GitHub
⚡ One API for 20+ LLM providers. OpenAI-compatible, single binary, runs forever.
☆15Jun 1, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tuska298 / djmax-random-selector-v
View on GitHub
A program for selecting music randomly in DJMAX RESPECT V
☆11Jan 30, 2025Updated last year
xiaoch2004 / librosa_py3_pYIN
View on GitHub
pYIN pitch detection implementation with librosa and python 3
☆14Jul 16, 2019Updated 7 years ago
ShawnShiSS / machine-learning-applications
View on GitHub
A collection of real-world machine learning web applications built with ML.NET, ASP.NET Core, Azure Cosmos DB, and React, which can be us…
☆11Jan 25, 2021Updated 5 years ago
thedataquarry / structured-outputs
View on GitHub
Structured output benchmarks comparing DSPy and BAML with different LLMs
☆28Dec 23, 2025Updated 6 months ago
CodeBeamOrg / BCSS
View on GitHub
Revolutionary Runtime CSS Generator for Blazor
☆14Apr 3, 2026Updated 3 months ago
hmislk / hmis-analyzer-middleware
View on GitHub
Middle-ware for LIMS
☆10Mar 30, 2023Updated 3 years ago
I3K-IT / RAG-Enterprise
View on GitHub
🚀 100% local RAG system with one-command setup. Your data never leaves your server. AGPL-3.0
☆57Jun 23, 2026Updated 3 weeks ago