lemonade-sdk / lemonadeLinks

Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPUs. Join our discord: https://discord.gg/Z3u8tpqQ

☆381

Alternatives and similar repositories for lemonade

Users that are interested in lemonade are comparing it to the libraries listed below

Sorting:

SearchSavior / OpenArc
Lightweight Inference server for OpenVINO
☆191Updated 2 weeks ago
iluxu / llmbasedos
Minimal Linux OS with a Model Context Protocol (MCP) gateway to expose local capabilities to LLMs.
☆260Updated last month
inferx-net / inferx
InferX is a Inference Function as a Service Platform
☆119Updated 2 weeks ago
kalavai-net / kalavai-client
A platform to self-host AI on easy mode
☆156Updated this week
MaxHastings / Kolo
The Fastest Way to Fine-Tune LLMs Locally
☆313Updated 4 months ago
amd / gaia
Run LLM Agents on Ryzen AI PCs in Minutes
☆485Updated last month
universal-tool-calling-protocol / python-utcp
Official python implementation of the UTCP
☆364Updated last week
NimbleEdge / sparse_transformers
Sparse Inferencing for transformer based LLMs
☆196Updated last week
TesslateAI / TFrameX
☆152Updated last week
intelligencedev / manifold
Manifold is a platform for enabling workflow automation using AI assistants.
☆455Updated last week
platinum-hill / cobolt
This is a cross-platform desktop application that allows you to chat with locally hosted LLMs and enjoy features like MCP support
☆221Updated this week
onnx / turnkeyml
No-code CLI designed for accelerating ONNX workflows
☆207Updated last month
iuliaturc / gguf-docs
Docs for GGUF quantization (unofficial)
☆205Updated 3 weeks ago
matteoserva / GraphLLM
☆207Updated 2 weeks ago
google-ai-edge / LiteRT-LM
☆290Updated this week
AMD-AIG-AIMA / Instella
Fully Open Language Models with Stellar Performance
☆241Updated last week
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆964Updated last week
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆67Updated last month
rhulha / Speech2Speech
A web application that converts speech to speech 100% private
☆73Updated 2 months ago
TesslateAI / Agent-Builder
☆109Updated this week
Fus3n / gem-assist
Command-line personal assistant using your favorite proprietary or local models with access to over 30+ tools
☆110Updated last month
atineiatte / deep-research-at-home
☆217Updated 3 months ago
RoyalCities / RC-Home-Assistant-Low-VRAM
Local AI voice assistant stack for Home Assistant (GPU-accelerated) with persistent memory, follow-up conversation, and Ollama model reco…
☆99Updated last week
EtiennePerot / safe-code-execution
Code execution utilities for Open WebUI & Ollama
☆290Updated 8 months ago
theroyallab / YALS
☆81Updated last week
jd-3d / SOLOBench
☆133Updated 3 months ago
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆98Updated last month
universal-tool-calling-protocol / utcp-specification
The specification for the Universal Tool Calling Protocol
☆176Updated last week
gpustack / gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆189Updated 2 weeks ago
mehtabmahir / easy-whisper-ui
Easy to use interface for the Whisper model optimized for all GPUs!
☆264Updated this week