iuliaturc / gguf-docsLinks
Docs for GGUF quantization (unofficial)
☆361Updated 6 months ago
Alternatives and similar repositories for gguf-docs
Users that are interested in gguf-docs are comparing it to the libraries listed below
Sorting:
- InferX: Inference as a Service Platform☆151Updated last week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆622Updated this week
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆156Updated 6 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆1,553Updated this week
- Sparse Inferencing for transformer based LLMs☆218Updated 5 months ago
- ☆209Updated 3 weeks ago
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆588Updated 2 weeks ago
- The Fastest Way to Fine-Tune LLMs Locally☆333Updated last month
- Enhancing LLMs with LoRA☆206Updated 3 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆283Updated last week
- A little(lil) Language Model (LM). A tiny reproduction of LLaMA 3's model architecture.☆55Updated 9 months ago
- ☆89Updated last month
- ☆304Updated 3 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆48Updated 3 months ago
- ☆109Updated 5 months ago
- A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.☆210Updated 8 months ago
- Big & Small LLMs working together☆1,249Updated last week
- automatically quant GGUF models☆219Updated last month
- Fast parallel LLM inference for MLX☆243Updated last year
- Train Large Language Models on MLX.☆241Updated 2 weeks ago
- AI management tool☆119Updated last year
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆594Updated 2 months ago
- ☆336Updated 6 months ago
- API Server for Transformer Lab☆82Updated 2 months ago
- llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work☆281Updated 3 weeks ago
- Distributed Inference for mlx LLm☆100Updated last year
- Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens☆564Updated last year
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- ☆439Updated last month
- ☆135Updated 8 months ago