mostlygeek / llama-swapLinks
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
☆1,977Updated this week
Alternatives and similar repositories for llama-swap
Users that are interested in llama-swap are comparing it to the libraries listed below
Sorting:
- llama.cpp fork with additional SOTA quants and improved performance☆1,358Updated this week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,096Updated last week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆586Updated last week
- ☆582Updated last week
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre…☆789Updated last month
- VS Code extension for LLM-assisted code/text completion☆1,072Updated 2 weeks ago
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,161Updated last week
- LLM Frontend in a single html file☆669Updated 2 weeks ago
- Manifold is a platform for enabling workflow automation using AI assistants.☆465Updated this week
- Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…☆1,752Updated last week
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆488Updated this week
- ☆1,215Updated this week
- Large-scale LLM inference engine☆1,600Updated last week
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆254Updated this week
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.☆878Updated 2 weeks ago
- Go manage your Ollama models☆1,593Updated 3 weeks ago
- ☆228Updated 6 months ago
- An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.☆837Updated 10 months ago
- Docs for GGUF quantization (unofficial)☆324Updated 4 months ago
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙☆1,379Updated last week
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆2,755Updated last month
- The AI toolkit for the AI developer☆1,097Updated this week
- LLM Benchmark for Throughput via Ollama (Local LLMs)☆313Updated 3 months ago
- Create Custom LLMs☆1,781Updated 3 weeks ago
- Big & Small LLMs working together☆1,211Updated this week
- Web UI for ExLlamaV2☆514Updated 10 months ago
- Optimizing inference proxy for LLMs☆3,204Updated this week
- LocalAGI is a powerful, self-hostable AI Agent platform designed for maximum privacy and flexibility. A complete drop-in replacement for …☆1,393Updated this week
- the terminal client for Ollama☆2,270Updated last month
- AI Inferencing at the Edge. A simple one-file way to run various GGML models with KoboldAI's UI with AMD ROCm offloading☆717Updated 2 weeks ago