mostlygeek / llama-swapLinks
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
☆2,311Updated this week
Alternatives and similar repositories for llama-swap
Users that are interested in llama-swap are comparing it to the libraries listed below
Sorting:
- llama.cpp fork with additional SOTA quants and improved performance☆1,587Updated this week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,121Updated 2 weeks ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆622Updated last week
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre…☆801Updated last month
- ☆857Updated last week
- Manifold is an experimental platform for enabling long horizon workflow automation using teams of AI assistants.☆478Updated this week
- ☆1,243Updated this week
- VS Code extension for LLM-assisted code/text completion☆1,139Updated 2 weeks ago
- Large-scale LLM inference engine☆1,641Updated 2 weeks ago
- Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https…☆2,086Updated this week
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,389Updated last week
- The python library for research and development in NLP, multimodal LLMs, Agents, ML, Knowledge Graphs, and more.☆1,188Updated this week
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.☆905Updated 2 months ago
- Docs for GGUF quantization (unofficial)☆361Updated 6 months ago
- ☆230Updated 8 months ago
- LLM Frontend in a single html file☆692Updated last month
- Go manage your Ollama models☆1,660Updated last month
- An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.☆849Updated last year
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙☆1,430Updated 2 weeks ago
- The Fastest Way to Fine-Tune LLMs Locally☆333Updated last month
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆290Updated last week
- Big & Small LLMs working together☆1,258Updated this week
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆346Updated 11 months ago
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆689Updated this week
- tl/dw (Too Long, Didn't Watch): Your Personal Research Multi-Tool - a naive attempt at 'A Young Lady's Illustrated Primer' (Open Source N…☆1,237Updated this week
- OpenAPI Tool Servers☆821Updated 4 months ago
- Web UI for ExLlamaV2☆513Updated 11 months ago
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆110Updated 3 months ago
- End-to-end documentation to set up your own local & fully private LLM server on Debian. Equipped with chat, web search, RAG, model manage…☆674Updated this week
- A tool to determine whether or not your PC can run a given LLM☆167Updated last year