π FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GPU support
β57Apr 27, 2026Updated 3 weeks ago
Alternatives and similar repositories for flexllama
Users that are interested in flexllama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.β19Jan 10, 2025Updated last year
- The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-graβ¦β61Feb 24, 2026Updated 2 months ago
- A Python-based chat application utilizing a Local LLM to generate complex thought chains for various use cases such as product developmenβ¦β20Feb 18, 2026Updated 3 months ago
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLMβ13May 30, 2025Updated 11 months ago
- llama-swap + a minimal ollama compatible apiβ59Mar 14, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ACE-Step: A Step Towards Music Generation Foundation Modelβ50May 20, 2025Updated last year
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Trainingβ48Jul 18, 2025Updated 10 months ago
- Measuring Thinking Efficiency in Reasoning Models - Research Repositoryβ39Dec 2, 2025Updated 5 months ago
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search β¦β52Feb 10, 2026Updated 3 months ago
- LLM Inference on consumer devicesβ132Mar 17, 2025Updated last year
- An fully autonomous agent that accesses the browser and performs tasks.β18Apr 25, 2025Updated last year
- Personal voice assistant, with voice interruption and Twilio supportβ18Feb 24, 2025Updated last year
- β16Dec 16, 2024Updated last year
- β24Jan 22, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A forward proxy to turn network traffic into personal memory for AI agentsβ38Mar 30, 2026Updated last month
- β21Jul 25, 2025Updated 9 months ago
- FlexAudioPrint is a Python-based app for transcribing audio to text using OpenAI's Whisper model. It offers a Gradio web interface and a β¦β10Apr 22, 2026Updated 3 weeks ago
- β13Jun 18, 2024Updated last year
- A comprehensive WebUI Toolkit for Resemble-AI's Chatterboxβ24Jun 7, 2025Updated 11 months ago
- β12May 30, 2025Updated 11 months ago
- Llama.cpp runner/swapper and proxy that emulates LMStudio / Ollama backendsβ58Aug 21, 2025Updated 9 months ago
- β12Apr 21, 2025Updated last year
- Enable tool/function calling for any LLM, in OpenAI and Ollama API formats, adding universal function calling to models without native suβ¦β76Dec 9, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β16Mar 11, 2025Updated last year
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.β35May 11, 2026Updated last week
- General Tool-calling API Proxyβ59Mar 26, 2026Updated last month
- Hill Space is All You Needβ17Jul 11, 2025Updated 10 months ago
- The High Performance LLM Native Mock Serverβ26Apr 26, 2026Updated 3 weeks ago
- β10Jan 23, 2025Updated last year
- ContainerHub is a lightweight, dark-themed Streamlit dashboard for quickly accessing your local Docker services via Tailscale. Add links β¦β33Jun 7, 2025Updated 11 months ago
- π³ MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test aβ¦β36Jan 18, 2026Updated 4 months ago
- Create text chunks which end at natural stopping points without using a tokenizerβ26Nov 26, 2025Updated 5 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Your Interface to Intelligenceβ49Apr 23, 2026Updated 3 weeks ago
- Simple node proxy for llama-server that enables MCP useβ19May 10, 2025Updated last year
- A reverse proxy manager written in go, to convert exposed ports into token-based auth protected portsβ20Apr 14, 2025Updated last year
- Simple CLI tool streamlines the process of managing AI models from the CivitAI platform. It offers functionalities to list available modeβ¦β17May 3, 2025Updated last year
- Offline LLM chatbot with personalized memory β works on CPU with multi-session memory support.β22Jan 10, 2026Updated 4 months ago
- Qt and QML based Close Combat-like game.β16Aug 3, 2013Updated 12 years ago
- code for Towards Data Science article on prompt-loss-weightβ11Jun 4, 2025Updated 11 months ago