iuliaturc / gguf-docsLinks
Docs for GGUF quantization (unofficial)
☆205Updated 3 weeks ago
Alternatives and similar repositories for gguf-docs
Users that are interested in gguf-docs are comparing it to the libraries listed below
Sorting:
- InferX is a Inference Function as a Service Platform☆119Updated 2 weeks ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆98Updated last month
- ☆207Updated 2 weeks ago
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆142Updated 2 months ago
- Lemonade helps users run local LLMs with the highest performance by configuring state-of-the-art inference engines for their NPUs and GPU…☆381Updated this week
- Minimal Linux OS with a Model Context Protocol (MCP) gateway to expose local capabilities to LLMs.☆260Updated last month
- AI management tool☆118Updated 9 months ago
- Lightweight Inference server for OpenVINO☆191Updated 2 weeks ago
- ☆226Updated 2 months ago
- Sparse Inferencing for transformer based LLMs☆196Updated last week
- ☆133Updated 3 months ago
- Blue-text Bot AI. Uses Ollama + AppleScript☆50Updated last year
- A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.☆194Updated 3 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆158Updated last year
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆140Updated 5 months ago
- ☆152Updated last week
- ☆155Updated 3 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆78Updated 10 months ago
- A local AI companion that uses a collection of free, open source AI models in order to create two virtual companions that will follow you…☆227Updated 2 weeks ago
- ☆109Updated this week
- Fast parallel LLM inference for MLX☆204Updated last year
- ☆28Updated last month
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆466Updated this week
- automatically quant GGUF models☆190Updated this week
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆39Updated 3 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆964Updated last week
- ☆81Updated last week
- ☆312Updated last week
- Official python implementation of the UTCP☆364Updated last week
- ☆132Updated 3 months ago