iuliaturc / gguf-docsLinks
Docs for GGUF quantization (unofficial)
☆301Updated 3 months ago
Alternatives and similar repositories for gguf-docs
Users that are interested in gguf-docs are comparing it to the libraries listed below
Sorting:
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆552Updated last week
- InferX: Inference as a Service Platform☆138Updated this week
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆141Updated 4 months ago
- A little(lil) Language Model (LM). A tiny reproduction of LLaMA 3's model architecture.☆52Updated 6 months ago
- Enhancing LLMs with LoRA☆173Updated 2 weeks ago
- Sparse Inferencing for transformer based LLMs☆201Updated 2 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆236Updated last week
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆44Updated last week
- ☆283Updated last week
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆145Updated last month
- ☆85Updated last month
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- ☆105Updated 2 months ago
- ☆207Updated 2 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆1,296Updated this week
- AI management tool☆121Updated last year
- ☆105Updated 4 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆82Updated last week
- The Fastest Way to Fine-Tune LLMs Locally☆324Updated 7 months ago
- Blue-text Bot AI. Uses Ollama + AppleScript☆50Updated last year
- A platform to self-host AI on easy mode☆173Updated this week
- A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.☆206Updated 6 months ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆564Updated last week
- A LLM trained only on data from certain time periods to reduce modern bias☆607Updated last month
- llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work☆277Updated 2 months ago
- API Server for Transformer Lab☆78Updated this week
- Fast parallel LLM inference for MLX☆225Updated last year
- Big & Small LLMs working together☆1,187Updated this week
- ☆135Updated 6 months ago
- ☆699Updated 3 weeks ago