dipampaul17 / KVSplitLinks
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
☆360Updated 5 months ago
Alternatives and similar repositories for KVSplit
Users that are interested in KVSplit are comparing it to the libraries listed below
Sorting:
- High-Performance Implementation of OpenAI's TikToken.☆458Updated 3 months ago
- Add object detection, tracking, and mobile notifications to any RTSP Camera or iPhone.☆485Updated last week
- ☆198Updated 5 months ago
- state of the art browsing agent (WebArena 72.7%)☆349Updated 3 weeks ago
- TUI app- Give it a YouTube URL and you get a transcription with possible speaker identification and optional summary or translation, all …☆321Updated 7 months ago
- Browser-LLM Auto-Scaling Technology☆748Updated 2 weeks ago
- Content addressable storage with excellent search☆352Updated last week
- ☆281Updated 4 months ago
- Git Based Memory Storage for Conversational AI Agent☆664Updated last month
- Attempt to create an Open Source Privacy Focused Rewind.ai Alternative that is a POD (Personal Online Datastore)☆222Updated last month
- Multimodal RAG to search and interact locally with technical documents of any kind☆273Updated last week
- CleverBee - The Open Source Deep Researcher Tool☆306Updated 4 months ago
- ☆161Updated 7 months ago
- A secure local sandbox to run LLM-generated code using Apple containers☆591Updated last week
- Animating R1's thoughts.☆385Updated 8 months ago
- Applying the ideas of Deepseek R1 to computer use☆216Updated 8 months ago
- Fact Graph☆287Updated 2 weeks ago
- Physical AI Assistant that illuminates your life☆187Updated 3 weeks ago
- Simple Agents Made Easy☆588Updated 2 weeks ago
- A hub for various industry-specific schemas to be used with VLMs.☆537Updated 5 months ago
- Docker-based inference engine for AMD GPUs☆230Updated last year
- Fully neural approach for text chunking☆378Updated last week
- Examples and guides for using the VLM Run API☆297Updated 3 weeks ago
- Retrieval Augmented Generation based on LanceDB☆323Updated last week
- Chat UI for Coderunner☆186Updated 2 months ago
- Run and explore Llama models locally with minimal dependencies on CPU☆189Updated last year
- ☆296Updated 7 months ago
- A comprehensive suite of tools, built to liberate science by making the creation, evaluation, and dissemination of research more transpar…☆222Updated 2 months ago
- This methodology provides a structured approach for collaborating with AI systems on software development projects. It addresses common i…☆367Updated last month
- LLM plugin for pulling content from Hacker News☆120Updated 5 months ago