dipampaul17 / KVSplitLinks
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
☆359Updated 3 months ago
Alternatives and similar repositories for KVSplit
Users that are interested in KVSplit are comparing it to the libraries listed below
Sorting:
- High-Performance Implementation of OpenAI's TikToken.☆451Updated 2 months ago
- Add object detection, tracking, and mobile notifications to any RTSP Camera or iPhone.☆441Updated last week
- Browser-LLM Auto-Scaling Technology☆548Updated 3 weeks ago
- state of the art browsing agent (WebArena 72.7%)☆347Updated this week
- ☆280Updated 3 months ago
- Attempt to create an Open Source Privacy Focused Rewind.ai Alternative that is a POD (Personal Online Datastore)☆221Updated this week
- Git Based Memory Storage for Conversational AI Agent☆608Updated last week
- Physical AI Assistant that illuminates your life☆162Updated last month
- TUI app- Give it a YouTube URL and you get a transcription with possible speaker identification and optional summary or translation, all …☆318Updated 5 months ago
- Applying the ideas of Deepseek R1 to computer use☆216Updated 7 months ago
- CleverBee - The Open Source Deep Researcher Tool☆307Updated 3 months ago
- Content addressable storage with excellent search☆348Updated this week
- This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient …☆219Updated 9 months ago
- ☆196Updated 4 months ago
- A secure local sandbox to run LLM-generated code using Apple containers☆520Updated this week
- Fully neural approach for text chunking☆370Updated 4 months ago
- Animating R1's thoughts.☆384Updated 7 months ago
- Multimodal RAG to search and interact locally with technical documents of any kind☆252Updated last month
- Minimal AI agent framework that just works with only seven tools.☆548Updated 2 months ago
- Chat UI for Coderunner☆180Updated last month
- ☆163Updated 5 months ago
- Docker-based inference engine for AMD GPUs☆230Updated 11 months ago
- Spegel - Reflect the web through AI☆313Updated 2 months ago
- A hub for various industry-specific schemas to be used with VLMs.☆533Updated 3 months ago
- RAG Logger is an open-source logging tool designed specifically for Retrieval-Augmented Generation (RAG) applications. It serves as a lig…☆225Updated 8 months ago
- ☆664Updated last month
- Examples and guides for using the VLM Run API☆293Updated 2 months ago
- Retrieval Augmented Generation based on LanceDB☆316Updated this week
- BookWith – A New Reading Experience with AI. A next-generation conversational reading platform that goes beyond traditional e-book reader…☆210Updated last month
- Run and explore Llama models locally with minimal dependencies on CPU☆190Updated 11 months ago