dipampaul17 / KVSplitLinks
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
☆356Updated last month
Alternatives and similar repositories for KVSplit
Users that are interested in KVSplit are comparing it to the libraries listed below
Sorting:
- High-Performance Implementation of OpenAI's TikToken.☆432Updated 2 weeks ago
- Browser-LLM Auto-Scaling Technology☆531Updated this week
- Attempt to create an Open Source Privacy Focused Rewind.ai Alternative for data capture☆216Updated 5 months ago
- Fully neural approach for text chunking☆367Updated 2 months ago
- MCP server and CLI tool for searching and downloading documents from Anna's Archive☆399Updated last week
- ☆278Updated last month
- ☆640Updated this week
- ☆163Updated 3 months ago
- Your personal plug and play memory layer for LLMs☆417Updated last week
- Minimal AI agent framework that just works with only seven tools.☆540Updated this week
- A hub for various industry-specific schemas to be used with VLMs.☆525Updated last month
- This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient …☆221Updated 7 months ago
- CleverBee - The Open Source Deep Researcher Tool☆301Updated last month
- ☆196Updated 2 months ago
- Retrieval Augmented Generation based on SQLite☆258Updated this week
- Examples and guides for using the VLM Run API☆283Updated this week
- Docker-based inference engine for AMD GPUs☆231Updated 9 months ago
- ai for jq☆243Updated 9 months ago
- Applying the ideas of Deepseek R1 to computer use☆214Updated 5 months ago
- Spegel - Reflect the web through AI☆299Updated last week
- Animating R1's thoughts.☆383Updated 5 months ago
- Browsers-as-a-service for automations and web agents☆376Updated last week
- GUI for selecting text files for concatenation and submission to LLMs☆176Updated last week
- Run and explore Llama models locally with minimal dependencies on CPU☆191Updated 9 months ago
- Your toolkit for autonomous, evolving agent ecosystems. Create, execute, govern, and evolve agents that learn from experience, collaborat…☆439Updated this week
- Implement recursion using English as the programming language and an LLM as the runtime.☆238Updated 2 years ago
- A comprehensive suite of tools, built to liberate science by making the creation, evaluation, and dissemination of research more transpar…☆200Updated last month
- HTTP API for Claude Code, Goose, Aider, and Codex☆649Updated last week
- Min.js Style Compression of Tech Docs for LLM Context☆640Updated last month
- RAG Logger is an open-source logging tool designed specifically for Retrieval-Augmented Generation (RAG) applications. It serves as a lig…☆222Updated 6 months ago