unslothai / llama.cppLinks
LLM inference in C/C++
☆104Updated last week
Alternatives and similar repositories for llama.cpp
Users that are interested in llama.cpp are comparing it to the libraries listed below
Sorting:
- Distributed Inference for mlx LLm☆100Updated last year
- ☆109Updated 5 months ago
- automatically quant GGUF models☆219Updated last month
- LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.☆107Updated 6 months ago
- Unsloth Studio☆126Updated 10 months ago
- ☆68Updated last year
- Easy to use, High Performant Knowledge Distillation for LLMs☆97Updated 9 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆65Updated last year
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆157Updated 7 months ago
- Kyutai with an "eye"☆235Updated 10 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆49Updated 3 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆92Updated last year
- 1.58 Bit LLM on Apple Silicon using MLX☆242Updated last year
- ☆119Updated last year
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆100Updated 7 months ago
- ☆101Updated last year
- Simple examples using Argilla tools to build AI☆57Updated last year
- ☆57Updated 11 months ago
- ☆141Updated 5 months ago
- Gemma 2 optimized for your local machine.☆378Updated last year
- ☆94Updated 7 months ago
- ☆166Updated 6 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆110Updated 8 months ago
- ☆159Updated 9 months ago
- A pure MLX-based training pipeline for fine-tuning LLMs using GRPO on Apple Silicon.☆226Updated 3 months ago
- Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies a…☆39Updated 9 months ago
- Utils for Unsloth https://github.com/unslothai/unsloth☆188Updated last week
- Fast parallel LLM inference for MLX☆245Updated last year
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang☆100Updated last week
- ☆135Updated last month