Thireus / GGUF-Tool-SuiteLinks
GGUF Tool Suite is a set of flexible utilities that enable users to experiment with and create custom GGUF quantisation blends. It simplifies the process of mixing quant types (like iq3_xxs, iq4_kt, iq1_s_r4, etc.) to optimize performance, reduce model size, and preserve accuracy across different hardware and use cases.
☆22Updated this week
Alternatives and similar repositories for GGUF-Tool-Suite
Users that are interested in GGUF-Tool-Suite are comparing it to the libraries listed below
Sorting:
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆23Updated last week
- Lightweight C inference for Qwen3 GGUF with the smallest (0.6B) at the fullest (FP32)☆15Updated last week
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆43Updated 10 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆41Updated 3 weeks ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆98Updated last month
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Updated 8 months ago
- ☆95Updated 7 months ago
- Load and run Llama from safetensors files in C☆12Updated 9 months ago
- ☆57Updated last month
- Local LLM inference & management server with built-in OpenAI API☆31Updated last year
- 1.58-bit LLaMa model☆81Updated last year
- ☆24Updated 6 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated last month
- FMS Model Optimizer is a framework for developing reduced precision neural network models.☆20Updated this week
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM☆11Updated 2 months ago
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆22Updated last year
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 6 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- ☆9Updated last year
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆28Updated 3 months ago
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆47Updated last year
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆31Updated 3 months ago
- ☆51Updated last year
- automatically quant GGUF models☆190Updated last week
- A local front-end for open-weight LLMs with memory, RAG, TTS/STT, Elo ratings, and dynamic research tools. Built with React and FastAPI.☆30Updated this week
- Visual Tagger is a JavaScript tool that visually highlights HTML elements for AIs, aiding in identifying interactive components on web pa…☆10Updated 9 months ago
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆42Updated last month
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆67Updated last month
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆26Updated 4 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆53Updated last year