Docs for GGUF quantization (unofficial)
☆420Jul 19, 2025Updated 9 months ago
Alternatives and similar repositories for gguf-docs
Users that are interested in gguf-docs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- a character-ai like UI for LLM☆10Dec 3, 2024Updated last year
- Writing Tools, Apple's AI-inspired app, enchants Windows, enhancing your pen with AI LLMs. One hotkey press, system-wide, fixes grammar, …☆27Jul 26, 2025Updated 8 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Dec 25, 2024Updated last year
- CLI utility to inspect and explore .safetensors and .gguf files☆51Oct 28, 2025Updated 5 months ago
- A llama.cpp simple wrapper in Swift☆19Nov 9, 2025Updated 5 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- llama.cpp fork with additional SOTA quants and improved performance☆2,095Updated this week
- Enhancing LLMs with LoRA☆218Oct 20, 2025Updated 5 months ago
- An OpenVoice-based voice cloning tool, single executable file (~14M), supporting multiple formats without dependencies on ffmpeg, Python,…☆48Jan 18, 2026Updated 3 months ago
- A simple, easy-to-customize pipeline for local RAG evaluation. Starter prompts and metric definitions included.☆25Jan 14, 2026Updated 3 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆51Oct 29, 2025Updated 5 months ago
- Video plugin for Mupen64Plus v2.0, based on the Arachnoid plugin for Project64.☆19Mar 30, 2026Updated 2 weeks ago
- A minimal CLI tool for piping anything into an LLM.☆21Jan 1, 2026Updated 3 months ago
- A tool for humans to interact with a Chroma vector database☆16Mar 2, 2025Updated last year
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆36Jan 18, 2026Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- LLM backed Fantasy Tribe Game☆19Nov 21, 2024Updated last year
- Kubernetes operator for local LLM inference with llama.cpp, vLLM, and TGI - multi-GPU, autoscaling, air-gapped, production-ready☆48Apr 11, 2026Updated last week
- ☆234Oct 30, 2025Updated 5 months ago
- Metal GPU implementation of the Qwen3 transformer model on macOS with complete Apple Silicon compute shader acceleration.☆43Oct 6, 2025Updated 6 months ago
- An extension for oobabooga/text-generation-webui that automatically unloads and reloads your model.☆17Apr 22, 2024Updated last year
- ☆10Nov 3, 2025Updated 5 months ago
- Minimal web client for chatting and roleplay with AI characters☆26Aug 21, 2025Updated 7 months ago
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc☆3,212Apr 12, 2026Updated last week
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input a target size and the toolchain w…☆108Updated this week
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A cross platform App that gives you the best UX to run models locally or remotely on your own hardware☆78Mar 22, 2026Updated 3 weeks ago
- Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling☆512Updated this week
- A universal adapter including zero-copy Python bindings for Philip Turner's metal flash attention library.☆25Dec 15, 2025Updated 4 months ago
- ☆17Jun 22, 2024Updated last year
- A Rust-based, SenseVoiceSmall☆30Apr 6, 2026Updated last week
- Thin wrapper around GGML to make life easier☆45Nov 5, 2025Updated 5 months ago
- AI Based "Happiness Optimizer"☆12Oct 20, 2024Updated last year
- A View Model framework written in rust, inspired by Next.js.☆10May 29, 2023Updated 2 years ago
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆118Feb 24, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Docker/podman container for llama.cpp/vllm/exllamav{2,3} orchestrated using llama-swap☆18Apr 10, 2026Updated last week
- Battery level of WH-1000XM4 headphones and other series models, based on the WMI wrapper for Plug-and-Play devices.☆15Dec 30, 2025Updated 3 months ago
- Desktop application for instant AI-powered text transformation. Translate, correct, summarize, and change the tone of any text, anywhere,…☆30Dec 29, 2025Updated 3 months ago
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated 2 years ago
- AI Assistant☆20Feb 21, 2026Updated last month
- Matrix multiplication on the NPU inside RK3588☆17Jun 27, 2024Updated last year
- ik_llama.cpp's Thireus fork with release builds for macOS/Windows/Ubuntu CPU, Vulkan and CUDA☆98Updated this week