mostlygeek / llama-swapLinks
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
☆2,176Updated this week
Alternatives and similar repositories for llama-swap
Users that are interested in llama-swap are comparing it to the libraries listed below
Sorting:
- ☆740Updated last week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,110Updated 3 weeks ago
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,239Updated last week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆616Updated last week
- Manifold is an experimental platform for enabling long horizon workflow automation using teams of AI assistants.☆475Updated this week
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre…☆795Updated last week
- VS Code extension for LLM-assisted code/text completion☆1,124Updated last week
- Go manage your Ollama models☆1,634Updated 2 weeks ago
- Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https…☆1,985Updated this week
- An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.☆842Updated 11 months ago
- LLM Frontend in a single html file☆681Updated 2 weeks ago
- ☆1,234Updated last week
- Large-scale LLM inference engine☆1,613Updated last week
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙☆1,414Updated last week
- Docs for GGUF quantization (unofficial)☆347Updated 5 months ago
- ☆229Updated 8 months ago
- The AI toolkit for the AI developer☆1,165Updated this week
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆612Updated last week
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆2,786Updated last month
- OpenAPI Tool Servers☆799Updated 3 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆273Updated 2 weeks ago
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.☆897Updated last month
- Big & Small LLMs working together☆1,241Updated this week
- LocalAGI is a powerful, self-hostable AI Agent platform designed for maximum privacy and flexibility. A complete drop-in replacement for …☆1,492Updated last week
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆216Updated last month
- AI Inferencing at the Edge. A simple one-file way to run various GGML models with KoboldAI's UI with AMD ROCm offloading☆725Updated 2 weeks ago
- The Fastest Way to Fine-Tune LLMs Locally☆332Updated 3 weeks ago
- Pipelines: Versatile, UI-Agnostic OpenAI-Compatible Plugin Framework☆2,239Updated 4 months ago
- Web UI for ExLlamaV2☆514Updated 11 months ago
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆346Updated 10 months ago