๐ FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GPU support
โ50Feb 17, 2026Updated last week
Alternatives and similar repositories for flexllama
Users that are interested in flexllama are comparing it to the libraries listed below
Sorting:
- Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.โ18Jan 10, 2025Updated last year
- FlexAudioPrint is a Python-based app for transcribing audio to text using OpenAI's Whisper model. It offers a Gradio web interface and a โฆโ10Jan 29, 2026Updated last month
- The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-graโฆโ56Updated this week
- llama-swap + a minimal ollama compatible apiโ49Feb 13, 2026Updated 2 weeks ago
- SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiโฆโ54Updated this week
- Simple node proxy for llama-server that enables MCP useโ17May 10, 2025Updated 9 months ago
- LLMProxy is an intelligent large language model backend routing proxy service.โ22Dec 6, 2025Updated 2 months ago
- A reverse proxy manager written in go, to convert exposed ports into token-based auth protected portsโ20Apr 14, 2025Updated 10 months ago
- Offline LLM chatbot with personalized memory โ works on CPU with multi-session memory support.โ22Jan 10, 2026Updated last month
- OpenAPI specifications => MCP (Model Context Protocol) toolsโ19Dec 9, 2024Updated last year
- Measuring Thinking Efficiency in Reasoning Models - Research Repositoryโ39Dec 2, 2025Updated 2 months ago
- A local-first LLM development studio. Build, test, and customize inference workflows with your own models โ no cloud, totally local.โ17May 21, 2025Updated 9 months ago
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search โฆโ51Feb 10, 2026Updated 2 weeks ago
- โ64Jun 24, 2025Updated 8 months ago
- Personal voice assistant, with voice interruption and Twilio supportโ18Feb 24, 2025Updated last year
- An fully autonomous agent that accesses the browser and performs tasks.โ17Apr 25, 2025Updated 10 months ago
- ContainerHub is a lightweight, dark-themed Streamlit dashboard for quickly accessing your local Docker services via Tailscale. Add links โฆโ33Jun 7, 2025Updated 8 months ago
- โ17Dec 16, 2024Updated last year
- A Python-based chat application utilizing a Local LLM to generate complex thought chains for various use cases such as product developmenโฆโ20Feb 18, 2026Updated last week
- A forward proxy to turn network traffic into personal memory for AI agentsโ36Feb 23, 2026Updated last week
- LLM Inference on consumer devicesโ130Mar 17, 2025Updated 11 months ago
- ACE-Step: A Step Towards Music Generation Foundation Modelโ49May 20, 2025Updated 9 months ago
- โ21Jul 25, 2025Updated 7 months ago
- the rent a hal project for AIโ22Aug 12, 2025Updated 6 months ago
- AI debugger and AI coder integrated. Use AI to code and drives runtime debuggerโ83Nov 25, 2025Updated 3 months ago
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engineโ26Dec 20, 2024Updated last year
- Visually select, search, and copy your code into your clipboard for LLM context.โ26May 18, 2025Updated 9 months ago
- โ24Jan 22, 2025Updated last year
- Adding a multi-text multi-speaker script (diffe) that is based on a script from asiff00 on issue 61 for Sesame: A Conversational Speech Gโฆโ26Mar 28, 2025Updated 11 months ago
- ๐ฎ Material You TUI for monitoring NVIDIA GPUsโ58Jan 16, 2026Updated last month
- Open WebUI, ComfyUI, n8n, LocalAI, LLM Proxy, SearXNG, Qdrant, Postgres all in docker composeโ66Oct 26, 2024Updated last year
- Enable tool/function calling for any LLM, in OpenAI and Ollama API formats, adding universal function calling to models without native suโฆโ69Dec 9, 2025Updated 2 months ago
- Analyze Reddit postsโ30Feb 27, 2025Updated last year
- Create text chunks which end at natural stopping points without using a tokenizerโ26Nov 26, 2025Updated 3 months ago
- Autonomous, agentic, creative story writing system that incorporates stored embeddings and Knowledge Graphs.โ95Feb 16, 2026Updated last week
- Simply paste your Github Repo link and this app will generate a relevant Dockerfile + docker-compose.yaml to easily deploy any repo/projeโฆโ73May 6, 2025Updated 9 months ago
- โ53Oct 10, 2025Updated 4 months ago
- Moondream MCP Server in Pythonโ44Jul 2, 2025Updated 7 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.โ35Feb 11, 2026Updated 2 weeks ago