iuliaturc / gguf-docsLinks
Docs for GGUF quantization (unofficial)
☆261Updated 2 months ago
Alternatives and similar repositories for gguf-docs
Users that are interested in gguf-docs are comparing it to the libraries listed below
Sorting:
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆114Updated 2 months ago
- InferX is a Inference Function as a Service Platform☆133Updated last week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆499Updated this week
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆163Updated last year
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆146Updated 3 months ago
- A little(lil) Language Model (LM). A tiny reproduction of LLaMA 3's model architecture.☆52Updated 4 months ago
- Enhancing LLMs with LoRA☆137Updated last week
- Lightweight Inference server for OpenVINO☆211Updated this week
- Sparse Inferencing for transformer based LLMs☆197Updated last month
- ☆209Updated 2 weeks ago
- A LLM trained only on data from certain time periods to reduce modern bias☆536Updated last week
- llama.cpp fork with additional SOTA quants and improved performance☆1,181Updated this week
- llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work☆273Updated last month
- ☆266Updated 3 months ago
- A Conversational Speech Generation Model with Gradio UI and OpenAI compatible API. UI and API support CUDA, MLX and CPU devices.☆201Updated 4 months ago
- AI management tool☆121Updated 10 months ago
- ☆133Updated 4 months ago
- Official repository for "DynaSaur: Large Language Agents Beyond Predefined Actions"☆350Updated 9 months ago
- ☆165Updated last month
- The Fastest Way to Fine-Tune LLMs Locally☆320Updated 6 months ago
- ☆28Updated 3 months ago
- Live-bending a foundation model’s output at neural network level.☆265Updated 5 months ago
- ☆334Updated this week
- A platform to self-host AI on easy mode☆167Updated last week
- ☆178Updated 2 weeks ago
- FastMLX is a high performance production ready API to host MLX models.☆331Updated 6 months ago
- Blue-text Bot AI. Uses Ollama + AppleScript☆50Updated last year
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆42Updated 2 weeks ago
- Official python implementation of UTCP. UTCP is an open standard that lets AI agents call any API directly, without extra middleware.☆545Updated this week
- ☆99Updated 3 months ago