mostlygeek / llama-swapLinks
Model swapping for llama.cpp (or any local OpenAI API compatible server)
☆1,615Updated last week
Alternatives and similar repositories for llama-swap
Users that are interested in llama-swap are comparing it to the libraries listed below
Sorting:
- llama.cpp fork with additional SOTA quants and improved performance☆1,220Updated this week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,059Updated this week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆513Updated this week
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre…☆773Updated 2 weeks ago
- Manifold is a platform for enabling workflow automation using AI assistants.☆463Updated 2 months ago
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,082Updated last week
- LLM Frontend in a single html file☆647Updated 8 months ago
- VS Code extension for LLM-assisted code/text completion☆973Updated 2 weeks ago
- Large-scale LLM inference engine☆1,560Updated last week
- Go manage your Ollama models☆1,468Updated last week
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.☆843Updated 5 months ago
- ☆224Updated 4 months ago
- ☆1,155Updated last week
- An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.☆814Updated 8 months ago
- Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…☆1,385Updated this week
- Web UI for ExLlamaV2☆512Updated 8 months ago
- ☆313Updated this week
- The AI toolkit for the AI developer☆960Updated this week
- Open-source LLMOps platform for hosting and scaling AI in your own infrastructure 🏓🦙☆1,312Updated 3 weeks ago
- Big & Small LLMs working together☆1,170Updated last week
- The Fastest Way to Fine-Tune LLMs Locally☆321Updated 6 months ago
- Docs for GGUF quantization (unofficial)☆267Updated 2 months ago
- Lightweight Inference server for OpenVINO☆212Updated last week
- Easy to use interface for the Whisper model optimized for all GPUs!☆317Updated 2 months ago
- OpenAPI Tool Servers☆685Updated last week
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆97Updated last week
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆339Updated 7 months ago
- ☆178Updated 3 weeks ago
- An AI memory layer with short- and long-term storage, semantic clustering, and optional memory decay for context-aware applications.☆662Updated 8 months ago
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆211Updated 2 weeks ago