Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input a target size and the toolchain will create a GGUF recipe tuned to your hardware within seconds — flexible model sizing and lowest achievable perplexity/kld for GGUF enthusiasts seeking precise and automated dynamic quant production.
☆124May 5, 2026Updated this week
Alternatives and similar repositories for GGUF-Tool-Suite
Users that are interested in GGUF-Tool-Suite are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Esobold - A fork of KoboldCPP with agent schenanigans and server side saving!☆30May 3, 2026Updated last week
- Inference Llama 2 in one file of pure Haskell (A port of llama2.c from Andrej Karpathy)☆14Oct 17, 2025Updated 6 months ago
- A minimal CLI tool for piping anything into an LLM.☆21Jan 1, 2026Updated 4 months ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆827Updated this week
- ☆56Oct 10, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆170Apr 23, 2026Updated 2 weeks ago
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆25Sep 1, 2025Updated 8 months ago
- An image editor server and iOS app☆19Jan 10, 2026Updated 3 months ago
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM☆13May 30, 2025Updated 11 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆2,379Updated this week
- OPSIIE (OPSIE) is an advanced Self-Centered Intelligence (SCI) prototype that represents a new paradigm in AI-human interaction.☆25Oct 26, 2025Updated 6 months ago
- A collection python tools used to create gguf files and upload to huggingface☆17Mar 28, 2026Updated last month
- ☆32Jul 20, 2024Updated last year
- A cross platform App that gives you the best UX to run models locally or remotely on your own hardware☆78Mar 22, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Loader extension for tabbyAPI in SillyTavern☆26Jun 30, 2025Updated 10 months ago
- Allows AMD GPU's to run CUDA only software☆83Apr 29, 2026Updated last week
- A fork of textgen that kept some things like Exllama and old GPTQ.☆22Aug 20, 2024Updated last year
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13May 5, 2024Updated 2 years ago
- HDM model loader for ComfyUI☆42Dec 14, 2025Updated 4 months ago
- ik_llama.cpp's Thireus fork with release builds for macOS/Windows/Ubuntu CPU, Vulkan and CUDA☆122Updated this week
- cli tool to quantize gguf, gptq, awq, hqq and exl2 models☆79Dec 17, 2024Updated last year
- Kubernetes operator for local LLM inference with llama.cpp, vLLM, and TGI - multi-GPU, autoscaling, air-gapped, production-ready☆69Updated this week
- Make YouTube videos readable. Local-first Markdown summaries with Ollama, with cloud providers support.☆63Dec 28, 2025Updated 4 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- An unsupervised model merging algorithm for Transformers-based language models.☆108Apr 29, 2024Updated 2 years ago
- ☆18Jul 12, 2025Updated 9 months ago
- ☆31Nov 5, 2024Updated last year
- ☆77Jun 20, 2025Updated 10 months ago
- Thank you LenAnderson I am yoinking this!☆27Updated this week
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆190Mar 23, 2026Updated last month
- Docker/podman container for llama.cpp/vllm/exllamav{2,3} orchestrated using llama-swap☆18Apr 10, 2026Updated 3 weeks ago
- A web-app to explore topics using LLM (less typing and more clicks)☆67Mar 15, 2026Updated last month
- Desktop application for instant AI-powered text transformation. Translate, correct, summarize, and change the tone of any text, anywhere,…☆32Dec 29, 2025Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆64Jul 10, 2025Updated 9 months ago
- Learn faster with the power of AI☆17Apr 29, 2026Updated last week
- A dynamic multi-expert AI architecture running on a single consumer GPU (RTX 3060).☆36Dec 2, 2025Updated 5 months ago
- Sparse Inferencing for transformer based LLMs☆218Mar 25, 2026Updated last month
- 録音不要でオリジナルAI音声の教師データを作るGUIツール☆102Apr 18, 2026Updated 3 weeks ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆52Oct 29, 2025Updated 6 months ago
- ☆24Oct 13, 2025Updated 6 months ago