lemonade-sdk / lemonadeLinks
Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs. Join our discord: https://discord.gg/Z3u8tpqQ
☆381Updated this week
Alternatives and similar repositories for lemonade
Users that are interested in lemonade are comparing it to the libraries listed below
Sorting:
- Lightweight Inference server for OpenVINO☆191Updated 2 weeks ago
- Minimal Linux OS with a Model Context Protocol (MCP) gateway to expose local capabilities to LLMs.☆260Updated last month
- InferX is a Inference Function as a Service Platform☆119Updated 2 weeks ago
- A platform to self-host AI on easy mode☆156Updated this week
- The Fastest Way to Fine-Tune LLMs Locally☆313Updated 4 months ago
- Run LLM Agents on Ryzen AI PCs in Minutes☆485Updated last month
- Official python implementation of the UTCP☆364Updated last week
- Sparse Inferencing for transformer based LLMs☆196Updated last week
- ☆152Updated last week
- Manifold is a platform for enabling workflow automation using AI assistants.☆455Updated last week
- This is a cross-platform desktop application that allows you to chat with locally hosted LLMs and enjoy features like MCP support☆221Updated this week
- No-code CLI designed for accelerating ONNX workflows☆207Updated last month
- Docs for GGUF quantization (unofficial)☆205Updated 3 weeks ago
- ☆207Updated 2 weeks ago
- ☆290Updated this week
- Fully Open Language Models with Stellar Performance☆241Updated last week
- llama.cpp fork with additional SOTA quants and improved performance☆964Updated last week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆67Updated last month
- A web application that converts speech to speech 100% private☆73Updated 2 months ago
- ☆109Updated this week
- Command-line personal assistant using your favorite proprietary or local models with access to over 30+ tools☆110Updated last month
- ☆217Updated 3 months ago
- Local AI voice assistant stack for Home Assistant (GPU-accelerated) with persistent memory, follow-up conversation, and Ollama model reco…☆99Updated last week
- Code execution utilities for Open WebUI & Ollama☆290Updated 8 months ago
- ☆81Updated last week
- ☆133Updated 3 months ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆98Updated last month
- The specification for the Universal Tool Calling Protocol☆176Updated last week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆189Updated 2 weeks ago
- Easy to use interface for the Whisper model optimized for all GPUs!☆264Updated this week