Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
☆361May 21, 2025Updated 10 months ago
Alternatives and similar repositories for KVSplit
Users that are interested in KVSplit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Merliot Device Hub☆166Jun 11, 2025Updated 10 months ago
- Fully neural approach for text chunking☆407Oct 23, 2025Updated 5 months ago
- Artificial Neural Engine Machine Learning Library☆1,569Mar 10, 2026Updated last month
- Docker-based inference engine for AMD GPUs☆233Oct 7, 2024Updated last year
- A simple alternative to homebrew for installing binary packages on MacOS & Linux written in Go.☆220Feb 16, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆10Feb 14, 2025Updated last year
- A browser-based, WebGL2 implementation of GPT-2 with transform block and attention matrix visualization☆342Oct 24, 2025Updated 5 months ago
- ☆199May 5, 2025Updated 11 months ago
- Erlang interpreter for Node-RED (visual flow based programming) with Elixir support☆333Apr 3, 2026Updated last week
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems f…☆1,096Aug 9, 2025Updated 8 months ago
- High-Performance Implementation of OpenAI's TikToken.☆473Jul 3, 2025Updated 9 months ago
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆682May 20, 2025Updated 10 months ago
- Pytorch script hot swap: Change code without unloading your LLM from VRAM☆125Apr 21, 2025Updated 11 months ago
- Min.js Style Compression of Tech Docs for LLM Context☆676Oct 5, 2025Updated 6 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Official Rust Implementation of Model2Vec☆170Updated this week
- Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full…☆13,436Updated this week
- Rewriting Principia Mathematica in Lean☆137Feb 5, 2026Updated 2 months ago
- Browser-LLM Auto-Scaling Technology☆775Jan 29, 2026Updated 2 months ago
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆630Feb 24, 2025Updated last year
- High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and in…☆1,266Apr 1, 2026Updated 2 weeks ago
- LLM plugin for pulling content from Hacker News☆126May 5, 2025Updated 11 months ago
- Resource (icon) extraction tools☆13Apr 22, 2024Updated last year
- VPN over UDP☆114Feb 3, 2026Updated 2 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Server for Matching Long/Lat to Timezone☆47Feb 21, 2026Updated last month
- A new wide-spectrum content blocker for Safari.☆363Feb 16, 2026Updated 2 months ago
- A native macOS app that allows users to chat with a local LLM that can respond with information from files, folders and websites on your …☆3,202Apr 4, 2026Updated last week
- ☆1,308Aug 21, 2025Updated 7 months ago
- Render source code in 3D, for macOS and iOS.☆197Dec 1, 2024Updated last year
- A cache for AI agents to learn and replay complex behaviors.☆756Jun 15, 2025Updated 10 months ago
- Real-time guardrail that shows token spend & kills runaway LLM/agent loops.☆159Jul 31, 2025Updated 8 months ago
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆703Jun 14, 2025Updated 10 months ago
- jpg Acropalypse POC☆30Mar 18, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆3,131Jul 7, 2025Updated 9 months ago
- a small, lightweight crate for numerical integration written in Rust.☆115Mar 21, 2026Updated 3 weeks ago
- a concurrent hash array mapped trie implementation in go☆59Jun 19, 2025Updated 9 months ago
- A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.☆4,069Mar 26, 2026Updated 3 weeks ago
- You should own and be able to do anything with YOUR social data, not just the apps, ais, and algoritms of the profit-oriented companies t…☆50Updated this week
- Offline SOS signaling and recovery app for wars and disasters (iOS & Android) — like a digital flare-gun.☆339May 2, 2025Updated 11 months ago
- TideCloak lets your users hold their own digital authority—no central control, no blind trust.☆64Jul 28, 2025Updated 8 months ago