Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
☆362May 21, 2025Updated 9 months ago
Alternatives and similar repositories for KVSplit
Users that are interested in KVSplit are comparing it to the libraries listed below
Sorting:
- Merliot Device Hub☆166Jun 11, 2025Updated 8 months ago
- Docker-based inference engine for AMD GPUs☆233Oct 7, 2024Updated last year
- Fully neural approach for text chunking☆406Oct 23, 2025Updated 4 months ago
- A browser-based, WebGL2 implementation of GPT-2 with transform block and attention matrix visualization☆342Oct 24, 2025Updated 4 months ago
- Artificial Neural Engine Machine Learning Library☆1,351Feb 27, 2026Updated last week
- ☆200May 5, 2025Updated 10 months ago
- A simple alternative to homebrew for installing binary packages on MacOS & Linux written in Go.☆216Feb 16, 2026Updated 2 weeks ago
- Erlang interpreter for Node-RED (visual flow based programming) with Elixir support☆333Jan 8, 2026Updated last month
- Official Rust Implementation of Model2Vec☆160Feb 5, 2026Updated last month
- Min.js Style Compression of Tech Docs for LLM Context☆670Oct 5, 2025Updated 5 months ago
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆682May 20, 2025Updated 9 months ago
- ☆10Feb 14, 2025Updated last year
- TideCloak lets your users hold their own digital authority—no central control, no blind trust.☆64Jul 28, 2025Updated 7 months ago
- Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full…☆12,761Feb 27, 2026Updated last week
- Transductive regular expressions☆254Sep 25, 2025Updated 5 months ago
- Server for Matching Long/Lat to Timezone☆47Feb 21, 2026Updated last week
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆626Feb 24, 2025Updated last year
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems f…☆1,096Aug 9, 2025Updated 6 months ago
- Resource (icon) extraction tools☆12Apr 22, 2024Updated last year
- An MCP server that runs AI-driven venture capitalist agents (Fred Wilson, Peter Thiel, etc.), whose thinking is continuously enriched by …☆20May 12, 2025Updated 9 months ago
- Rewriting Principia Mathematica in Lean☆138Feb 5, 2026Updated last month
- High-Performance Implementation of OpenAI's TikToken.☆473Jul 3, 2025Updated 8 months ago
- Proof of concept for a VPN over UDP☆115Feb 3, 2026Updated last month
- ☆1,296Aug 21, 2025Updated 6 months ago
- This repo contains a new way to use bloom filters to do lossless video compression☆250Jun 5, 2025Updated 9 months ago
- Browser-LLM Auto-Scaling Technology☆777Jan 29, 2026Updated last month
- A native macOS app that allows users to chat with a local LLM that can respond with information from files, folders and websites on your …☆3,184Nov 17, 2025Updated 3 months ago
- See Through Your Models☆400Jul 8, 2025Updated 7 months ago
- A JPEG Image Compression Service using Part Homomorphic Encryption.☆31Mar 7, 2025Updated 11 months ago
- Snap-Scope: Analyze Your Lens Focal Length Distribution 📸✨☆20Jul 15, 2025Updated 7 months ago
- A new wide-spectrum content blocker for Safari.☆365Feb 16, 2026Updated 2 weeks ago
- AI Dataset Generator – Create realistic datasets for demos, learning, and dashboards☆752Oct 3, 2025Updated 5 months ago
- High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and in…☆1,252Feb 28, 2026Updated last week
- Neurox control helm chart details☆30Apr 29, 2025Updated 10 months ago
- ☆271Aug 22, 2025Updated 6 months ago
- A lightweight tool that converts directory contents into structured output optimized for LLM interpretation, featuring Git-aware file ord…☆18Nov 27, 2025Updated 3 months ago
- LLM plugin for pulling content from Hacker News☆125May 5, 2025Updated 10 months ago
- Code release for "LLMs can see and hear without any training"☆457May 8, 2025Updated 9 months ago
- A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.☆3,894Updated this week