fairydreaming / llama.cppLinks

LLM inference in C/C++

☆21

Alternatives and similar repositories for llama.cpp

Users that are interested in llama.cpp are comparing it to the libraries listed below

Sorting:

chigkim / Ollama-MMLU-Pro
☆109Updated 5 months ago
xhedit / quantkit
cli tool to quantize gguf, gptq, awq, hqq and exl2 models
☆78Updated last year
leafspark / AutoGGUF
automatically quant GGUF models
☆219Updated last month
LostRuins / datasetexplorer
Easily view and modify JSON datasets for large language models
☆87Updated 8 months ago
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆49Updated 3 months ago
unslothai / llama.cpp
LLM inference in C/C++
☆104Updated last week
tdrussell / qlora-pipe
A pipeline parallel training script for LLMs.
☆166Updated 9 months ago
remichu-ai / gallama
☆135Updated last month
nyunAI / PruneGPT
☆51Updated last year
SicariusSicariiStuff / SLOP_Detector
SLOP Detector and analyzer based on dictionary for shareGPT JSON and text
☆81Updated this week
epolewski / EricLLM
A fast batching API to serve LLM models
☆189Updated last year
mzbac / mlx_sharding
Distributed Inference for mlx LLm
☆100Updated last year
rombodawg / Easy_training
☆51Updated 11 months ago
flamingrickpat / private-machine
private-machine is an AI companion system with emotion, needs and goals simulation. Very silly, not based on real science.
☆28Updated 2 months ago
rafacelente / bllama
1.58-bit LLaMa model
☆82Updated last year
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆88Updated this week
uukuguy / speechless
LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.
☆107Updated 6 months ago
huseinzol05 / transformers-continuous-batching
Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.
☆29Updated 10 months ago
severian42 / MoA-Ollama-Chat
This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…
☆118Updated last year
Ce-daros / Tinystory-LM-656K-param
☆22Updated last year
fidecastro / llama-cpp-connector
Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!
☆29Updated last month
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆180Updated last year
adrienbrault / hf-gguf-to-ollama
Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.
☆119Updated last year
abgulati / kosmos-2_5-containerized
Kosmos-2.5 is a cutting-edge Multimodal-LLM (MLLM) specializing in image OCR. However, its stringent software requirements & Python-scrip…
☆67Updated last year
QuixiAI / OpenChatML
☆166Updated 5 months ago
fairydreaming / farel-bench
Testing LLM reasoning abilities with family relationship quizzes.
☆63Updated last year
abgulati / hf-waitress
Serving LLMs in the HF-Transformers format via a PyFlask API
☆72Updated last year
nath1295 / MLX-Textgen
A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.
☆100Updated 7 months ago
CerebrasResearch / reap
REAP: Router-weighted Expert Activation Pruning for SMoE compression
☆222Updated last month
monk1337 / auto-ollama
run ollama & gguf easily with a single command
☆52Updated last year