arozanov/turboquant-mlx

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/arozanov/turboquant-mlx)

arozanov / turboquant-mlx

TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.

☆112

Alternatives and similar repositories for turboquant-mlx

Users that are interested in turboquant-mlx are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

marceloeatworld / clopinette-ai
View on GitHub
Cloudflare-native AI agent — 13 tools, codemode, 5-layer memory, self-learning, multimodal I/O. Telegram, Discord & WhatsApp bots. Web se…
☆25Jul 7, 2026Updated 2 weeks ago
Doriandarko / mlx-local-server
View on GitHub
A tiny server to run local inference on MLX model in the style of OpenAI
☆13Jan 31, 2024Updated 2 years ago
SharpAI / SwiftLM
View on GitHub
⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache…
☆723May 19, 2026Updated 2 months ago
barrontang / gguf2mlx
View on GitHub
GUF to MLX Converter; LM studio advanced tips; Simple text generation using Apple's MLX framework;Enhanced MLX implementation with transf…
☆32Jul 9, 2026Updated last week
bokiko / openClaw-dashboard
View on GitHub
OpenClaw AI Agent Swarm Dashboard
☆33Apr 6, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lovisdotio / skills-fal-ai
View on GitHub
☆21Apr 30, 2026Updated 2 months ago
OpenAccess-AI-Collective / ggml-webui
View on GitHub
Deploy your GGML models to HuggingFace Spaces with Docker and gradio
☆37Jun 6, 2023Updated 3 years ago
aisar-labs / turboquant-rs
View on GitHub
Rust implementation of TurboQuant vector quantization (ICLR 2026, Google Research)
☆21Mar 26, 2026Updated 3 months ago
Ali-expandings / mactune
View on GitHub
The honest macOS tune-up — diagnoses what's actually slow, fixes only what's safe & reversible, scores your Mac 0-100. One bash script, z…
☆20Jun 13, 2026Updated last month
blacktop / fm-rs
View on GitHub
Rust bindings for Apple's FoundationModels.framework
☆22Updated this week
Somi-Project / Somi
View on GitHub
Local-first AI agent framework with GUI, memory, web search, personality constructs, speech i/o, tools, skills, CLI & Telegram features —…
☆23Mar 20, 2026Updated 4 months ago
1oT / YggdraSIM
View on GitHub
Python toolkit for SIM/eSIM and eUICC work: SCP03, SCP80, SCP11 (relay, local, eIM), SAIP profile packages, and a simulated UICC/eUICC en…
☆16Jul 3, 2026Updated 2 weeks ago
Neutrinic / flare
View on GitHub
Full-stack OpenTelemetry observability for Apache Spark
☆16Feb 28, 2026Updated 4 months ago
bstnxbt / dflash-mlx
View on GitHub
Lossless DFlash speculative decoding for MLX on Apple Silicon
☆753Jun 11, 2026Updated last month
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
onur-gokyildiz-bhi / tq-kv
View on GitHub
Pure Rust implementation of Google's TurboQuant (ICLR 2026) — KV cache compression for LLMs
☆39Apr 19, 2026Updated 3 months ago
christopherkarani / Colony
View on GitHub
An on device Agent Runtime
☆23May 13, 2026Updated 2 months ago
chacosoldier / compabob
View on GitHub
A customizable Claude Code setup for knowledge workers: agents, safety hooks, skills, memory, and an Obsidian knowledge base. Clone, run …
☆28Jun 23, 2026Updated 3 weeks ago
locaith / bio-memory-ai-locaith
View on GitHub
🧠 Bio-Agent OS: 🇻🇳 Bio-Inspired Memory Framework for AI Agents (OpenClaw/ERP). Researched & Developed by Dev Tuan Anh Ha (Locaith Solu…
☆20Apr 21, 2026Updated 3 months ago
lsb / sqlite-vector-search
View on GitHub
☆31Sep 1, 2023Updated 2 years ago
konjoai / squish
View on GitHub
⚡️ The fastest way to run local LLMs on Apple Silicon — sub-second model loads, beats Ollama on throughput, tail latency, and full-respon…
☆16Jul 13, 2026Updated last week
armgabrielyan / primer
View on GitHub
Build real software step-by-step with Claude, Codex, OpenCode, Gemini, Cursor, and other agents
☆15Mar 29, 2026Updated 3 months ago
SRSWTI / bodega-inference-engine
View on GitHub
fastest runtime for apple silicon.
☆88Apr 16, 2026Updated 3 months ago
Beaulewis1977 / claude-code-context-command
View on GitHub
Universal context usage analyzer for Claude Code - works from any directory in any project
☆20Nov 16, 2025Updated 8 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
jjang-ai / jangq
View on GitHub
JANG — GGUF for MLX. YOU MUST USE JANG_Q RUNTIME. Adaptive Mixed-Precision Quantization + Runtime for Apple Silicon
☆213Updated this week
deepfates / splice
View on GitHub
import your conversation history into AI 🫚
☆16Jul 14, 2026Updated last week
The-Swarm-Corporation / NewsAgent
View on GitHub
NewsAgent is an enterprise-grade news aggregation agent designed to fetch, query, and summarize news from multiple sources at scale.
☆29Oct 13, 2025Updated 9 months ago
bug-ops / zeph
View on GitHub
A memory-first AI agent that remembers why decisions were made — not just the last message. Runs local (Ollama), cloud (Claude · OpenAI ·…
☆51Updated this week
Liquid4All / leap-ios
View on GitHub
☆40Mar 12, 2026Updated 4 months ago
dirmacs / ares
View on GitHub
Agentic AI server in Rust. Multi-provider LLM routing, tool calling, RAG, MCP, multi-tenant workflows.
☆16Jun 18, 2026Updated last month
marcogva-hub / mlx-flashattention-steel
View on GitHub
Metal Flash Attention for MLX
☆19Jul 14, 2026Updated last week
0xSero / open-trees
View on GitHub
An Opencode plugin for managing git worktrees.
☆75Mar 25, 2026Updated 3 months ago
Doorman11991 / MarrowScript
View on GitHub
MarrowScript compiler. Welcome to deterministic typed LLM orchestration as a compile-time concern
☆32May 21, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
telekom / sparrow
View on GitHub
A monitoring tool to gather infrastructure network information
☆24Updated this week
wa91h / local-ai-toolkit
View on GitHub
A self-hosted AI toolkit running locally via Docker Compose, bundling an LLM gateway, workflow automation, and a chat UI — all backed by …
☆16May 17, 2026Updated 2 months ago
latenceainew / colsearch
View on GitHub
High-performance late-interaction retrieval engine for on-prem AI. ColBERT/ColPali multi-vector search with Rust fused MaxSim, Triton GPU…
☆17Jul 6, 2026Updated 2 weeks ago
juxt / allium-tools
View on GitHub
☆22Jun 19, 2026Updated last month
createthis / diffcalculia
View on GitHub
☆16May 8, 2025Updated last year
mzbac / mlx-llm-server
View on GitHub
For inferring and serving local LLMs using the MLX framework
☆115Mar 24, 2024Updated 2 years ago
exo-explore / mlx-bitnet
View on GitHub
1.58 Bit LLM on Apple Silicon using MLX
☆294May 10, 2024Updated 2 years ago