yazon/flexllama

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yazon/flexllama)

yazon / flexllama

🚀 FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GPU support

☆50

Alternatives and similar repositories for flexllama

Users that are interested in flexllama are comparing it to the libraries listed below

Sorting:

robbiemu / llama-gguf-optimize
View on GitHub
Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.
☆18Jan 10, 2025Updated last year
loglux / FlexAudioPrint
View on GitHub
FlexAudioPrint is a Python-based app for transcribing audio to text using OpenAI's Whisper model. It offers a Gradio web interface and a …
☆10Jan 29, 2026Updated last month
boneylizard / Eloquent
View on GitHub
The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-gra…
☆56Updated this week
kooshi / llama-swappo
View on GitHub
llama-swap + a minimal ollama compatible api
☆49Feb 13, 2026Updated 2 weeks ago
peva3 / SmarterRouter
View on GitHub
SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profi…
☆54Updated this week
extopico / llama-server_mcp_proxy
View on GitHub
Simple node proxy for llama-server that enables MCP use
☆17May 10, 2025Updated 9 months ago
obirler / LLMProxy
View on GitHub
LLMProxy is an intelligent large language model backend routing proxy service.
☆22Dec 6, 2025Updated 2 months ago
Heaust-ops / rauxy
View on GitHub
A reverse proxy manager written in go, to convert exposed ports into token-based auth protected ports
☆20Apr 14, 2025Updated 10 months ago
lynthera / bitsegments_localminds
View on GitHub
Offline LLM chatbot with personalized memory — works on CPU with multi-session memory support.
☆22Jan 10, 2026Updated last month
2013xile / openapi2mcptools
View on GitHub
OpenAPI specifications => MCP (Model Context Protocol) tools
☆19Dec 9, 2024Updated last year
cpldcpu / LRMTokenEconomy
View on GitHub
Measuring Thinking Efficiency in Reasoning Models - Research Repository
☆39Dec 2, 2025Updated 2 months ago
Thrasher-Software / sigil
View on GitHub
A local-first LLM development studio. Build, test, and customize inference workflows with your own models — no cloud, totally local.
☆17May 21, 2025Updated 9 months ago
Toy-97 / Chat-WebUI
View on GitHub
Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search …
☆51Feb 10, 2026Updated 2 weeks ago
ideaweaver-ai-code / ideaweaver
View on GitHub
☆64Jun 24, 2025Updated 8 months ago
thooton / aspen
View on GitHub
Personal voice assistant, with voice interruption and Twilio support
☆18Feb 24, 2025Updated last year
wchisasa / rabbit
View on GitHub
An fully autonomous agent that accesses the browser and performs tasks.
☆17Apr 25, 2025Updated 10 months ago
ronnie-1205 / ContainerHub
View on GitHub
ContainerHub is a lightweight, dark-themed Streamlit dashboard for quickly accessing your local Docker services via Tailscale. Add links …
☆33Jun 7, 2025Updated 8 months ago
charmandercha / ArchiDoc
View on GitHub
☆17Dec 16, 2024Updated last year
devinambron / PyThoughtChain
View on GitHub
A Python-based chat application utilizing a Local LLM to generate complex thought chains for various use cases such as product developmen…
☆20Feb 18, 2026Updated last week
kstonekuan / digital-twin-proxy
View on GitHub
A forward proxy to turn network traffic into personal memory for AI agents
☆36Feb 23, 2026Updated last week
Infini-AI-Lab / UMbreLLa
View on GitHub
LLM Inference on consumer devices
☆130Mar 17, 2025Updated 11 months ago
PasiKoodaa / ACE-Step-RADIO
View on GitHub
ACE-Step: A Step Towards Music Generation Foundation Model
☆49May 20, 2025Updated 9 months ago
rohinmanvi / Capability-Aware-and-Mid-Generation-Self-Evaluations
View on GitHub
☆21Jul 25, 2025Updated 7 months ago
jimpames / rentahal
View on GitHub
the rent a hal project for AI
☆22Aug 12, 2025Updated 6 months ago
Zentar-Ai / Zentara-Code
View on GitHub
AI debugger and AI coder integrated. Use AI to code and drives runtime debugger
☆83Nov 25, 2025Updated 3 months ago
severian42 / Proteus-The-Genesis-LLM
View on GitHub
Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engine
☆26Dec 20, 2024Updated last year
rickkoh / plainrepo
View on GitHub
Visually select, search, and copy your code into your clipboard for LLM context.
☆26May 18, 2025Updated 9 months ago
rodrigobaron / anthill
View on GitHub
☆24Jan 22, 2025Updated last year
zenforic / csm-multi
View on GitHub
Adding a multi-text multi-speaker script (diffe) that is based on a script from asiff00 on issue 61 for Sesame: A Conversational Speech G…
☆26Mar 28, 2025Updated 11 months ago
gvlassis / gvtop
View on GitHub
🎮 Material You TUI for monitoring NVIDIA GPUs
☆58Jan 16, 2026Updated last month
j4ys0n / local-ai-stack
View on GitHub
Open WebUI, ComfyUI, n8n, LocalAI, LLM Proxy, SearXNG, Qdrant, Postgres all in docker compose
☆66Oct 26, 2024Updated last year
Oct4Pie / toolbridge
View on GitHub
Enable tool/function calling for any LLM, in OpenAI and Ollama API formats, adding universal function calling to models without native su…
☆69Dec 9, 2025Updated 2 months ago
summersonnn / reddit_analyzer
View on GitHub
Analyze Reddit posts
☆30Feb 27, 2025Updated last year
jabberjabberjabber / Chunkify
View on GitHub
Create text chunks which end at natural stopping points without using a tokenizer
☆26Nov 26, 2025Updated 3 months ago
Lanerra / saga
View on GitHub
Autonomous, agentic, creative story writing system that incorporates stored embeddings and Knowledge Graphs.
☆95Feb 16, 2026Updated last week
jontstaz / AI-Docker-Compose-Generator
View on GitHub
Simply paste your Github Repo link and this app will generate a relevant Dockerfile + docker-compose.yaml to easily deploy any repo/proje…
☆73May 6, 2025Updated 9 months ago
k-koehler / gguf-tensor-overrider
View on GitHub
☆53Oct 10, 2025Updated 4 months ago
ColeMurray / moondream-mcp
View on GitHub
Moondream MCP Server in Python
☆44Jul 2, 2025Updated 7 months ago
stringandstickytape / MaxsAiStudio
View on GitHub
A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.
☆35Feb 11, 2026Updated 2 weeks ago