mostlygeek / llama-swapLinks
Model swapping for llama.cpp (or any local OpenAPI compatible server)
☆1,088Updated this week
Alternatives and similar repositories for llama-swap
Users that are interested in llama-swap are comparing it to the libraries listed below
Sorting:
- llama.cpp fork with additional SOTA quants and improved performance☆902Updated this week
- Manifold is a platform for enabling workflow automation using AI assistants.☆455Updated this week
- What If Language Models Expertly Routed All Inference? WilmerAI allows prompts to be routed to specialized workflows based on the domain …☆731Updated 2 weeks ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,016Updated this week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆447Updated last week
- ☆213Updated 2 months ago
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆1,942Updated 3 weeks ago
- LLM Frontend in a single html file☆525Updated 6 months ago
- ☆938Updated last week
- Large-scale LLM inference engine☆1,482Updated last week
- The AI toolkit for the AI developer☆838Updated this week
- The Fastest Way to Fine-Tune LLMs Locally☆312Updated 4 months ago
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆319Updated 5 months ago
- Web UI for ExLlamaV2☆506Updated 5 months ago
- Code execution utilities for Open WebUI & Ollama☆290Updated 8 months ago
- Lightweight Inference server for OpenVINO☆190Updated last week
- An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.☆797Updated 6 months ago
- Local LLM Powered Recursive Search & Smart Knowledge Explorer☆248Updated 5 months ago
- VS Code extension for LLM-assisted code/text completion☆858Updated last week
- Local LLM Server with GPU and NPU Acceleration☆296Updated last week
- Big & Small LLMs working together☆1,088Updated this week
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.☆798Updated 3 months ago
- Easy to use interface for the Whisper model optimized for all GPUs!☆241Updated last week
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆89Updated last month
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆601Updated 9 months ago
- ☆204Updated last week
- Docs for GGUF quantization (unofficial)☆192Updated 2 weeks ago
- InferX is a Inference Function as a Service Platform☆117Updated last week
- Collection of LLM system prompts.☆135Updated this week
- This project demonstrates a basic chain-of-thought interaction with any LLM (Large Language Model)☆321Updated 10 months ago