mostlygeek / llama-swapLinks
Reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etc
☆1,730Updated this week
Alternatives and similar repositories for llama-swap
Users that are interested in llama-swap are comparing it to the libraries listed below
Sorting:
- llama.cpp fork with additional SOTA quants and improved performance☆1,266Updated this week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,068Updated last week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆532Updated last week
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre…☆781Updated last week
- Manifold is a platform for enabling workflow automation using AI assistants.☆464Updated this week
- VS Code extension for LLM-assisted code/text completion☆1,001Updated last week
- LLM Frontend in a single html file☆650Updated 9 months ago
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,117Updated 3 weeks ago
- ☆1,174Updated this week
- Large-scale LLM inference engine☆1,567Updated 2 weeks ago
- Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…☆1,488Updated last week
- ☆226Updated 5 months ago
- Docs for GGUF quantization (unofficial)☆286Updated 3 months ago
- ☆415Updated this week
- Go manage your Ollama models☆1,494Updated 3 weeks ago
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙☆1,337Updated 2 weeks ago
- An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.☆824Updated 8 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆215Updated this week
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.☆847Updated 6 months ago
- Easy to use interface for the Whisper model optimized for all GPUs!☆372Updated 2 months ago
- The AI toolkit for the AI developer☆1,021Updated this week
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆352Updated last week
- OpenAPI Tool Servers☆713Updated last month
- The Fastest Way to Fine-Tune LLMs Locally☆322Updated 7 months ago
- Code execution utilities for Open WebUI & Ollama☆300Updated 11 months ago
- A proxy server for multiple ollama instances with Key security☆508Updated last week
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆338Updated 7 months ago
- Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?☆1,806Updated last year
- Web UI for ExLlamaV2☆510Updated 8 months ago
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆96Updated 3 weeks ago