iuliaturc / gguf-docsLinks
Docs for GGUF quantization (unofficial)
☆284Updated 3 months ago
Alternatives and similar repositories for gguf-docs
Users that are interested in gguf-docs are comparing it to the libraries listed below
Sorting:
- InferX: Inference as a Service Platform☆136Updated this week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆532Updated this week
- Sparse Inferencing for transformer based LLMs☆201Updated 2 months ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆137Updated 3 months ago
- A little(lil) Language Model (LM). A tiny reproduction of LLaMA 3's model architecture.☆52Updated 5 months ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆504Updated this week
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- Enhancing LLMs with LoRA☆163Updated last month
- ☆28Updated 4 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS over OpenAI endpoints.☆213Updated this week
- The Fastest Way to Fine-Tune LLMs Locally☆322Updated 7 months ago
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆145Updated last week
- llama.cpp fork with additional SOTA quants and improved performance☆1,258Updated this week
- ☆273Updated 4 months ago
- A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.☆206Updated 5 months ago
- AI management tool☆121Updated 11 months ago
- ☆207Updated last month
- llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work☆275Updated last month
- A LLM trained only on data from certain time periods to reduce modern bias☆562Updated 3 weeks ago
- Official python implementation of UTCP. UTCP is an open standard that lets AI agents call any API directly, without extra middleware.☆579Updated last week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆81Updated this week
- ☆102Updated last month
- A platform to self-host AI on easy mode☆170Updated this week
- ☆83Updated last week
- Live-bending a foundation model’s output at neural network level.☆266Updated 6 months ago
- ☆179Updated last month
- ☆225Updated 5 months ago
- ☆135Updated 5 months ago
- Model swapping for llama.cpp (or any local OpenAI API compatible server)☆1,690Updated this week
- ☆166Updated 2 months ago