Thireus / GGUF-Tool-SuiteLinks
Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowest achievable perplexity for advanced users seeking precise and automated GGUF dynamic quant production.
☆43Updated this week
Alternatives and similar repositories for GGUF-Tool-Suite
Users that are interested in GGUF-Tool-Suite are comparing it to the libraries listed below
Sorting:
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆82Updated this week
- Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.☆18Updated 2 weeks ago
- automatically quant GGUF models☆200Updated this week
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆42Updated 11 months ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆27Updated 4 months ago
- ☆100Updated 3 weeks ago
- Lightweight Inference server for OpenVINO☆211Updated this week
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆24Updated last month
- Simple node proxy for llama-server that enables MCP use☆13Updated 4 months ago
- InferX is a Inference Function as a Service Platform☆133Updated this week
- Running Microsoft's BitNet via Electron, React & Astro☆44Updated 3 months ago
- ☆83Updated this week
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆42Updated last week
- Sparse Inferencing for transformer based LLMs☆197Updated last month
- Easily view and modify JSON datasets for large language models☆82Updated 4 months ago
- ☆23Updated 10 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆28Updated last week
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆110Updated 2 months ago
- ☆122Updated 10 months ago
- ☆209Updated last week
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆136Updated this week
- Lightweight & fast AI inference proxy for self-hosted LLMs backends like Ollama, LM Studio and others. Designed for speed, simplicity and…☆87Updated last week
- ☆50Updated 7 months ago
- A pipeline parallel training script for LLMs.☆159Updated 4 months ago
- LLM Ripper is a framework for component extraction (embeddings, attention heads, FFNs), activation capture, functional analysis, and adap…☆46Updated last week
- ☆62Updated 2 months ago
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆28Updated last month
- ☆133Updated 4 months ago
- ☆24Updated 7 months ago
- ☆20Updated 11 months ago