π FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GPU support
β58Apr 25, 2026Updated this week
Alternatives and similar repositories for flexllama
Users that are interested in flexllama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.β19Jan 10, 2025Updated last year
- The most feature-complete local AI workstation. Multi-GPU inference, integrated Stable Diffusion + ADetailer, voice cloning, research-graβ¦β59Feb 24, 2026Updated 2 months ago
- A Python-based chat application utilizing a Local LLM to generate complex thought chains for various use cases such as product developmenβ¦β20Feb 18, 2026Updated 2 months ago
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLMβ13May 30, 2025Updated 11 months ago
- llama-swap + a minimal ollama compatible apiβ58Mar 14, 2026Updated last month
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ACE-Step: A Step Towards Music Generation Foundation Modelβ50May 20, 2025Updated 11 months ago
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Trainingβ48Jul 18, 2025Updated 9 months ago
- Measuring Thinking Efficiency in Reasoning Models - Research Repositoryβ39Dec 2, 2025Updated 4 months ago
- LLM Inference on consumer devicesβ129Mar 17, 2025Updated last year
- An fully autonomous agent that accesses the browser and performs tasks.β18Apr 25, 2025Updated last year
- Visually select, search, and copy your code into your clipboard for LLM context.β26May 18, 2025Updated 11 months ago
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engineβ26Dec 20, 2024Updated last year
- Personal voice assistant, with voice interruption and Twilio supportβ18Feb 24, 2025Updated last year
- β17Dec 16, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β24Jan 22, 2025Updated last year
- β21Jul 25, 2025Updated 9 months ago
- A lightweight LLaMA.cpp HTTP server Docker image based on Alpine Linux.β35Apr 9, 2026Updated 3 weeks ago
- FlexAudioPrint is a Python-based app for transcribing audio to text using OpenAI's Whisper model. It offers a Gradio web interface and a β¦β10Apr 22, 2026Updated last week
- Automates the creation of full-text (sound and text) ebooks in epub/epub3/daisy format, the webserver/client creates smil files to sync aβ¦β10Nov 12, 2021Updated 4 years ago
- A comprehensive WebUI Toolkit for Resemble-AI's Chatterboxβ24Jun 7, 2025Updated 10 months ago
- β12May 30, 2025Updated 11 months ago
- V.I.S.O.R., my in-development AI-powered voice assistant with integrated memory!β36Nov 20, 2025Updated 5 months ago
- Llama.cpp runner/swapper and proxy that emulates LMStudio / Ollama backendsβ57Aug 21, 2025Updated 8 months ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Enable tool/function calling for any LLM, in OpenAI and Ollama API formats, adding universal function calling to models without native suβ¦β76Dec 9, 2025Updated 4 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.β34Apr 11, 2026Updated 2 weeks ago
- Simple node proxy for llama-server that enables MCP useβ19May 10, 2025Updated 11 months ago
- General Tool-calling API Proxyβ60Mar 26, 2026Updated last month
- Hill Space is All You Needβ17Jul 11, 2025Updated 9 months ago
- The High Performance LLM Native Mock Serverβ25Updated this week
- β10Jan 23, 2025Updated last year
- ContainerHub is a lightweight, dark-themed Streamlit dashboard for quickly accessing your local Docker services via Tailscale. Add links β¦β33Jun 7, 2025Updated 10 months ago
- Create text chunks which end at natural stopping points without using a tokenizerβ26Nov 26, 2025Updated 5 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Your Interface to Intelligenceβ48Apr 23, 2026Updated last week
- Simple CLI tool streamlines the process of managing AI models from the CivitAI platform. It offers functionalities to list available modeβ¦β17May 3, 2025Updated 11 months ago
- Offline LLM chatbot with personalized memory β works on CPU with multi-session memory support.β22Jan 10, 2026Updated 3 months ago
- Orchestrator Kit for Agentic Reasoning - OrKa is a modular AI orchestration system that transforms Large Language Models (LLMs) into compβ¦β94Apr 12, 2026Updated 2 weeks ago
- Qt and QML based Close Combat-like game.β16Aug 3, 2013Updated 12 years ago
- A Python script to auto-detect and auto-crop a person in a imageβ16Mar 7, 2026Updated last month
- Moondream MCP Server in Pythonβ46Jul 2, 2025Updated 9 months ago