iuliaturc / gguf-docsLinks

Docs for GGUF quantization (unofficial)

☆319

Alternatives and similar repositories for gguf-docs

Users that are interested in gguf-docs are comparing it to the libraries listed below

Sorting:

turboderp-org / exllamav3
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
☆578Updated 2 weeks ago
inferx-net / inferx
InferX: Inference as a Service Platform
☆139Updated this week
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆143Updated 4 months ago
TheProxyCompany / proxy-structuring-engine
Guaranteed Structured Output from any Language Model via Hierarchical State Machines
☆145Updated last month
SearchSavior / OpenArc
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
☆247Updated 3 weeks ago
sam-paech / slop-forensics
☆289Updated 3 weeks ago
CohleM / lilLM
A little(lil) Language Model (LM). A tiny reproduction of LLaMA 3's model architecture.
☆52Updated 7 months ago
codelion / ellora
Enhancing LLMs with LoRA
☆176Updated last month
matt-c1 / llama-3-quant-comparison
Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.
☆165Updated last year
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆1,341Updated this week
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆233Updated last year
matteoserva / GraphLLM
☆209Updated 2 months ago
akashjss / sesame-csm
A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.
☆208Updated 6 months ago
leoheuler / flashtensors
☆399Updated 2 weeks ago
huawei-csl / SINQ
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …
☆578Updated this week
NimbleEdge / sparse_transformers
Sparse Inferencing for transformer based LLMs
☆213Updated 3 months ago
jd-3d / SOLOBench
☆135Updated 6 months ago
leafspark / AutoGGUF
automatically quant GGUF models
☆214Updated last month
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆46Updated last month
transformerlab / transformerlab-api
API Server for Transformer Lab
☆80Updated last week
iluxu / llmbasedos
llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work
☆278Updated 3 months ago
SlerpE / highCompute.py
☆28Updated 5 months ago
babycommando / neuralgraffiti
Live-bending a foundation model’s output at neural network level.
☆270Updated 7 months ago
NVlabs / Jet-Nemotron
☆703Updated last week
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆84Updated last month
kalavai-net / kalavai-client
A platform to self-host AI on easy mode
☆177Updated last week
teabranch / open-responses-server
Wraps any OpenAI API interface as Responses with MCPs support so it supports Codex. Adding any missing stateful features. Ollama and Vllm…
☆132Updated 3 weeks ago
Infini-AI-Lab / UMbreLLa
LLM Inference on consumer devices
☆125Updated 8 months ago
av / klmbr
klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs
☆85Updated last year
TC-Zheng / ActuosusAI
AI management tool
☆121Updated last year