π FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GPU support
β52Mar 5, 2026Updated 2 weeks ago
Alternatives and similar repositories for flexllama
Users that are interested in flexllama are comparing it to the libraries listed below
Sorting:
- Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.β18Jan 10, 2025Updated last year
- The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-graβ¦β57Feb 24, 2026Updated 3 weeks ago
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLMβ13May 30, 2025Updated 9 months ago
- llama-swap + a minimal ollama compatible apiβ52Mar 14, 2026Updated last week
- OpenAPI specifications => MCP (Model Context Protocol) toolsβ19Dec 9, 2024Updated last year
- ACE-Step: A Step Towards Music Generation Foundation Modelβ51May 20, 2025Updated 10 months ago
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Trainingβ47Jul 18, 2025Updated 8 months ago
- Measuring Thinking Efficiency in Reasoning Models - Research Repositoryβ39Dec 2, 2025Updated 3 months ago
- LLM Inference on consumer devicesβ130Mar 17, 2025Updated last year
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search β¦β52Feb 10, 2026Updated last month
- β66Jun 24, 2025Updated 8 months ago
- An fully autonomous agent that accesses the browser and performs tasks.β18Apr 25, 2025Updated 10 months ago
- Visually select, search, and copy your code into your clipboard for LLM context.β26May 18, 2025Updated 10 months ago
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engineβ26Dec 20, 2024Updated last year
- Personal voice assistant, with voice interruption and Twilio supportβ18Feb 24, 2025Updated last year
- Your Interface to Intelligenceβ44Updated this week
- β24Jan 22, 2025Updated last year
- β17Dec 16, 2024Updated last year
- A forward proxy to turn network traffic into personal memory for AI agentsβ36Updated this week
- β21Jul 25, 2025Updated 7 months ago
- A lightweight LLaMA.cpp HTTP server Docker image based on Alpine Linux.β32Oct 3, 2025Updated 5 months ago
- FlexAudioPrint is a Python-based app for transcribing audio to text using OpenAI's Whisper model. It offers a Gradio web interface and a β¦β10Jan 29, 2026Updated last month
- Cleanai (https://github.com/willmil11/cleanai) except I'm making it in c now. Fast and clean from the start this time :)β17Mar 6, 2026Updated 2 weeks ago
- β12May 30, 2025Updated 9 months ago
- A comprehensive WebUI Toolkit for Resemble-AI's Chatterboxβ23Jun 7, 2025Updated 9 months ago
- Llama.cpp runner/swapper and proxy that emulates LMStudio / Ollama backendsβ55Aug 21, 2025Updated 7 months ago
- V.I.S.O.R., my in-development AI-powered voice assistant with integrated memory!β36Nov 20, 2025Updated 4 months ago
- Crashbench is a LLM benchmark to measure bug-finding and reporting capabilities of LLMsβ14Mar 8, 2026Updated last week
- Enable tool/function calling for any LLM, in OpenAI and Ollama API formats, adding universal function calling to models without native suβ¦β71Dec 9, 2025Updated 3 months ago
- β15Mar 11, 2025Updated last year
- Simple node proxy for llama-server that enables MCP useβ18May 10, 2025Updated 10 months ago
- General Tool-calling API Proxyβ60Feb 21, 2026Updated last month
- Hill Space is All You Needβ17Jul 11, 2025Updated 8 months ago
- β10Jan 23, 2025Updated last year
- π³ MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test aβ¦β35Jan 18, 2026Updated 2 months ago
- A reverse proxy manager written in go, to convert exposed ports into token-based auth protected portsβ20Apr 14, 2025Updated 11 months ago
- Create text chunks which end at natural stopping points without using a tokenizerβ26Nov 26, 2025Updated 3 months ago
- Enterprise-ready vector database toolkit for building searchable knowledge bases from multiple data sources. Supports multi-project managβ¦β32Updated this week
- Offline LLM chatbot with personalized memory β works on CPU with multi-session memory support.β22Jan 10, 2026Updated 2 months ago