iuliaturc / gguf-docsLinks
Docs for GGUF quantization (unofficial)
☆333Updated 5 months ago
Alternatives and similar repositories for gguf-docs
Users that are interested in gguf-docs are comparing it to the libraries listed below
Sorting:
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆597Updated last week
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆148Updated 5 months ago
- InferX: Inference as a Service Platform☆143Updated this week
- Fast parallel LLM inference for MLX☆235Updated last year
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- Enhancing LLMs with LoRA☆193Updated 2 months ago
- A little(lil) Language Model (LM). A tiny reproduction of LLaMA 3's model architecture.☆53Updated 7 months ago
- Sparse Inferencing for transformer based LLMs☆215Updated 4 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆47Updated last month
- AI management tool☆121Updated last year
- ☆210Updated 3 months ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆261Updated last week
- Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model …☆583Updated this week
- ☆424Updated 2 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆1,390Updated this week
- ☆88Updated last week
- ☆108Updated 4 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated 10 months ago
- ☆298Updated last month
- Train Large Language Models on MLX.☆232Updated last week
- Big & Small LLMs working together☆1,226Updated this week
- ☆109Updated 6 months ago
- FastMLX is a high performance production ready API to host MLX models.☆337Updated 9 months ago
- ☆715Updated 3 weeks ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆99Updated 5 months ago
- chrome & firefox extension to chat with webpages: local llms☆128Updated last year
- Official repository for "DynaSaur: Large Language Agents Beyond Predefined Actions"☆351Updated 11 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆86Updated last year
- API Server for Transformer Lab☆81Updated last month
- Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens☆541Updated last year