unslothai / llama.cppLinks
LLM inference in C/C++
☆104Updated last week
Alternatives and similar repositories for llama.cpp
Users that are interested in llama.cpp are comparing it to the libraries listed below
Sorting:
- ☆109Updated 5 months ago
- Distributed Inference for mlx LLm☆100Updated last year
- Simple examples using Argilla tools to build AI☆57Updated last year
- Unsloth Studio☆126Updated 10 months ago
- LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.☆108Updated 6 months ago
- automatically quant GGUF models☆219Updated last month
- Utils for Unsloth https://github.com/unslothai/unsloth☆191Updated this week
- Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp☆170Updated 9 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆97Updated 9 months ago
- Kyutai with an "eye"☆236Updated 10 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆110Updated 8 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆243Updated last year
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang☆100Updated this week
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆49Updated 3 months ago
- 1.58-bit LLaMa model☆82Updated last year
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated last year
- GPT-4 Level Conversational QA Trained In a Few Hours☆65Updated last year
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆29Updated 10 months ago
- ☆141Updated 5 months ago
- ☆94Updated 7 months ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆157Updated 7 months ago
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆232Updated last month
- GRadient-INformed MoE☆264Updated last year
- ☆166Updated 6 months ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B☆569Updated 2 months ago
- Coding problems used in aider's polyglot benchmark☆199Updated last year
- LLM inference in C/C++☆21Updated 10 months ago
- ☆57Updated 11 months ago
- ☆159Updated 9 months ago
- Sparse Inferencing for transformer based LLMs☆217Updated 5 months ago