unslothai / llama.cppLinks

LLM inference in C/C++

☆104

Alternatives and similar repositories for llama.cpp

Users that are interested in llama.cpp are comparing it to the libraries listed below

Sorting:

chigkim / Ollama-MMLU-Pro
☆109Updated 5 months ago
mzbac / mlx_sharding
Distributed Inference for mlx LLm
☆100Updated last year
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆57Updated last year
unslothai / unsloth-studio
Unsloth Studio
☆126Updated 10 months ago
uukuguy / speechless
LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.
☆108Updated 6 months ago
leafspark / AutoGGUF
automatically quant GGUF models
☆219Updated last month
unslothai / unsloth-zoo
Utils for Unsloth https://github.com/unslothai/unsloth
☆191Updated this week
akx / ggify
Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp
☆170Updated 9 months ago
agokrani / distillKitPlus
Easy to use, High Performant Knowledge Distillation for LLMs
☆97Updated 9 months ago
kyutai-labs / moshivis
Kyutai with an "eye"
☆236Updated 10 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆110Updated 8 months ago
exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆243Updated last year
sgl-project / sgl-project.github.io
This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang
☆100Updated this week
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆49Updated 3 months ago
rafacelente / bllama
1.58-bit LLaMa model
☆82Updated last year
fairydreaming / farel-bench
Testing LLM reasoning abilities with family relationship quizzes.
☆63Updated last year
Cerebras / DocChat
GPT-4 Level Conversational QA Trained In a Few Hours
☆65Updated last year
huseinzol05 / transformers-continuous-batching
Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.
☆29Updated 10 months ago
QuixiAI / spectrum
☆141Updated 5 months ago
and270 / thinking_effort_processor
☆94Updated 7 months ago
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆157Updated 7 months ago
CerebrasResearch / reap
REAP: Router-weighted Expert Activation Pruning for SMoE compression
☆232Updated last month
microsoft / GRIN-MoE
GRadient-INformed MoE
☆264Updated last year
QuixiAI / OpenChatML
☆166Updated 6 months ago
WeiboAI / VibeThinker
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
☆569Updated 2 months ago
Aider-AI / polyglot-benchmark
Coding problems used in aider's polyglot benchmark
☆199Updated last year
fairydreaming / llama.cpp
LLM inference in C/C++
☆21Updated 10 months ago
ArturTanona / grpo_unsloth_docker
☆57Updated 11 months ago
janhq / ReZero
☆159Updated 9 months ago
NimbleEdge / sparse_transformers
Sparse Inferencing for transformer based LLMs
☆217Updated 5 months ago