mostlygeek / llama-swapLinks
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
☆2,086Updated this week
Alternatives and similar repositories for llama-swap
Users that are interested in llama-swap are comparing it to the libraries listed below
Sorting:
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,100Updated last week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆605Updated last week
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre…☆796Updated 2 weeks ago
- Manifold is an experimental platform for enabling long horizon workflow automation using teams of AI assistants.☆475Updated last week
- Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https…☆1,882Updated this week
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,201Updated last week
- Go manage your Ollama models☆1,618Updated last week
- Large-scale LLM inference engine☆1,611Updated last month
- VS Code extension for LLM-assisted code/text completion☆1,106Updated last month
- ☆1,222Updated this week
- LLM Frontend in a single html file☆672Updated 2 weeks ago
- The AI toolkit for the AI developer☆1,126Updated last week
- ☆228Updated 7 months ago
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.☆887Updated last month
- ☆670Updated 2 weeks ago
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙☆1,402Updated last week
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆266Updated this week
- Docs for GGUF quantization (unofficial)☆340Updated 5 months ago
- An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.☆837Updated 10 months ago
- Create Custom LLMs☆1,786Updated last month
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆103Updated last month
- The Fastest Way to Fine-Tune LLMs Locally☆330Updated last week
- Code execution utilities for Open WebUI & Ollama☆310Updated last year
- Web UI for ExLlamaV2☆514Updated 10 months ago
- High-performance Text-to-Speech server with OpenAI-compatible API, 8 voices, emotion tags, and modern web UI. Optimized for RTX GPUs.☆623Updated 5 months ago
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆560Updated this week
- Big & Small LLMs working together☆1,230Updated this week
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆344Updated 9 months ago
- Open‑WebUI Tools is a modular toolkit designed to extend and enrich your Open WebUI instance, turning it into a powerful AI workstation. …☆461Updated 2 weeks ago
- A tool to determine whether or not your PC can run a given LLM☆166Updated 10 months ago