π FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GPU support
β59Jun 10, 2026Updated 3 weeks ago
Alternatives and similar repositories for flexllama
Users that are interested in flexllama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.β20Jan 10, 2025Updated last year
- The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-graβ¦β62Feb 24, 2026Updated 4 months ago
- A Python-based chat application utilizing a Local LLM to generate complex thought chains for various use cases such as product developmenβ¦β20Feb 18, 2026Updated 4 months ago
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLMβ12May 30, 2025Updated last year
- OpenAPI specifications => MCP (Model Context Protocol) toolsβ19Dec 9, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ACE-Step: A Step Towards Music Generation Foundation Modelβ50May 20, 2025Updated last year
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Trainingβ50Jul 18, 2025Updated 11 months ago
- β65Jun 24, 2025Updated last year
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search β¦β52Feb 10, 2026Updated 4 months ago
- An fully autonomous agent that accesses the browser and performs tasks.β18Apr 25, 2025Updated last year
- Visually select, search, and copy your code into your clipboard for LLM context.β26May 18, 2025Updated last year
- Personal voice assistant, with voice interruption and Twilio supportβ18Feb 24, 2025Updated last year
- β24Jan 22, 2025Updated last year
- β16Dec 16, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A forward proxy to turn network traffic into personal memory for AI agentsβ38Mar 30, 2026Updated 3 months ago
- β21Jul 25, 2025Updated 11 months ago
- FlexAudioPrint is a Python-based app for transcribing audio to text using OpenAI's Whisper model. It offers a Gradio web interface and a β¦β10Apr 22, 2026Updated 2 months ago
- β13Jun 18, 2024Updated 2 years ago
- A lightweight LLaMA.cpp HTTP server Docker image based on Alpine Linux.β39Jun 8, 2026Updated 3 weeks ago
- Cleanai (https://github.com/willmil11/cleanai) except I'm making it in c now. Fast and clean from the start this time :)β15Jun 16, 2026Updated 2 weeks ago
- A comprehensive WebUI Toolkit for Resemble-AI's Chatterboxβ26Jun 7, 2025Updated last year
- β12May 30, 2025Updated last year
- Llama.cpp runner/swapper and proxy that emulates LMStudio / Ollama backendsβ59Aug 21, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Crashbench is a LLM benchmark to measure bug-finding and reporting capabilities of LLMsβ14Mar 8, 2026Updated 3 months ago
- β12Apr 21, 2025Updated last year
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.β36May 11, 2026Updated last month
- β17Mar 11, 2025Updated last year
- General Tool-calling API Proxyβ61Mar 26, 2026Updated 3 months ago
- Hill Space is All You Needβ17Jul 11, 2025Updated 11 months ago
- The High Performance LLM Native Mock Serverβ34May 24, 2026Updated last month
- β10Jan 23, 2025Updated last year
- Create text chunks which end at natural stopping points without using a tokenizerβ26Nov 26, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Your Interface to Intelligenceβ54Updated this week
- Simple node proxy for llama-server that enables MCP useβ19May 10, 2025Updated last year
- Simple CLI tool streamlines the process of managing AI models from the CivitAI platform. It offers functionalities to list available modeβ¦β17May 3, 2025Updated last year
- Offline LLM chatbot with personalized memory β works on CPU with multi-session memory support.β22Jan 10, 2026Updated 5 months ago
- A platform for Interactive AI-assisted Hypothesis Generation [ACL 2025]β34May 10, 2026Updated last month
- Qt and QML based Close Combat-like game.β16Aug 3, 2013Updated 12 years ago
- A Python script to auto-detect and auto-crop a person in a imageβ16Mar 7, 2026Updated 3 months ago