Docs for GGUF quantization (unofficial)
☆453Jul 19, 2025Updated 9 months ago
Alternatives and similar repositories for gguf-docs
Users that are interested in gguf-docs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- a character-ai like UI for LLM☆10Dec 3, 2024Updated last year
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆25Sep 1, 2025Updated 8 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Dec 25, 2024Updated last year
- Neural Audio Codecs implemented in C# - DAC, SNAC, Encodec, Dia☆46Jun 11, 2025Updated 10 months ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆173Jul 5, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- llama.cpp fork with additional SOTA quants and improved performance☆2,379Updated this week
- A simple, easy-to-customize pipeline for local RAG evaluation. Starter prompts and metric definitions included.☆24Jan 14, 2026Updated 3 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆52Oct 29, 2025Updated 6 months ago
- An OpenVoice-based voice cloning tool, single executable file (~14M), supporting multiple formats without dependencies on ffmpeg, Python,…☆48Jan 18, 2026Updated 3 months ago
- For converting LLM datasets from one format into another.☆22Nov 12, 2025Updated 5 months ago
- A tool for humans to interact with a Chroma vector database☆16Apr 25, 2026Updated 2 weeks ago
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆60Dec 1, 2024Updated last year
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆36Jan 18, 2026Updated 3 months ago
- LLM backed Fantasy Tribe Game☆19Nov 21, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A curated collection of OpenClaw resources: GCP installation guide, best practices, and use cases☆20Feb 19, 2026Updated 2 months ago
- Context7 Scoring Library☆31Sep 19, 2025Updated 7 months ago
- Web UI for working with large language models☆39Jun 13, 2024Updated last year
- Metal GPU implementation of the Qwen3 transformer model on macOS with complete Apple Silicon compute shader acceleration.☆45Oct 6, 2025Updated 7 months ago
- Locally hosted AI Agent Python Tool To Generate Novel Research Hypothesis + Titles + Abstracts☆30Apr 30, 2025Updated last year
- ☆238Oct 30, 2025Updated 6 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆22Updated this week
- An extension for oobabooga/text-generation-webui that automatically unloads and reloads your model.☆17Apr 22, 2024Updated 2 years ago
- ☆40Feb 25, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Kubernetes operator for local LLM inference with llama.cpp, vLLM, and TGI - multi-GPU, autoscaling, air-gapped, production-ready☆69Updated this week
- This is an official repository of the Kion Movies Recommendation Dataset.☆12Sep 2, 2022Updated 3 years ago
- ☆12Apr 21, 2026Updated 2 weeks ago
- JotItNow is a AI Voice Notes App☆25Mar 6, 2025Updated last year
- Minimal web client for chatting and roleplay with AI characters☆26Aug 21, 2025Updated 8 months ago
- [Arxiv 2026] ActionPlan: Future-Aware Streaming Motion Synthesis via Frame-Level Action Planning☆81Mar 26, 2026Updated last month
- ☆17Jun 22, 2024Updated last year
- A cross platform App that gives you the best UX to run models locally or remotely on your own hardware☆78Mar 22, 2026Updated last month
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input a target size and the toolchain w…☆124Updated this week
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A universal adapter including zero-copy Python bindings for Philip Turner's metal flash attention library.☆26Dec 15, 2025Updated 4 months ago
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc☆3,772May 1, 2026Updated last week
- Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling☆520Updated this week
- A Rust-based, SenseVoiceSmall☆32Apr 27, 2026Updated last week
- Thin wrapper around GGML to make life easier☆45Nov 5, 2025Updated 6 months ago
- ☆13Feb 26, 2025Updated last year
- AI Based "Happiness Optimizer"☆12Oct 20, 2024Updated last year