andrewkchan / deepseek.cpp

CPU inference for the DeepSeek family of large language models in pure C++

☆282

Alternatives and similar repositories for deepseek.cpp:

Users that are interested in deepseek.cpp are comparing it to the libraries listed below

andrewkchan / yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
☆279Updated 2 months ago
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆231Updated this week
horus-ai-labs / DistillFlow
☆141Updated last month
gpustack / llama-box
LM inference server implementation based on *.cpp.
☆154Updated this week
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆365Updated last month
bytedance / flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆811Updated 2 weeks ago
ling0322 / libllm
Efficient inference of large language models.
☆146Updated 3 months ago
microsoft / T-MAC
Low-bit LLM inference on CPU with lookup table
☆705Updated 2 months ago
MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆993Updated this week
microsoft / VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆622Updated this week
intel / auto-round
Advanced Quantization Algorithm for LLMs/VLMs.
☆413Updated this week
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆360Updated last week
HanzlaJavaid / Free-Search
Free Search is a wrapper on top of publicly available SearXNG instances to give free internet access as a rest API.
☆147Updated this week
efeslab / Nanoflow
A throughput-oriented high-performance serving framework for LLMs
☆785Updated 6 months ago
and270 / thinking_effort_processor
☆81Updated 3 weeks ago
ubergarm / r1-ktransformers-guide
run DeepSeek-R1 GGUFs on KTransformers
☆212Updated 3 weeks ago
farshed / sage
Self-hosted voice chat with LLMs
☆422Updated last month
bytedance / decoupleQ
A quantization algorithm for LLM
☆137Updated 9 months ago
kyutai-labs / moshivis
Kyutai with an "eye"
☆160Updated last week
wdlctc / headinfer
☆40Updated this week
chenweiphd / DeepSeek-MoE-ResourceMap
☆124Updated last month
likejazz / llama3.cuda
llama3.cuda is a pure C/CUDA implementation for Llama 3 model.
☆330Updated 9 months ago
gpustack / gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆139Updated 2 weeks ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆149Updated last week
PrimeIntellect-ai / OpenDiloco
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
☆478Updated 2 months ago
kroggen / mamba.c
Inference of Mamba models in pure C
☆187Updated last year
PrimeIntellect-ai / prime
prime is a framework for efficient, globally distributed training of AI models over the internet.
☆689Updated this week
johnmai-dev / NotebookMLX
📋 NotebookMLX - An Open Source version of NotebookLM (Ported NotebookLlama)
☆267Updated 3 weeks ago
MoonshotAI / MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
☆1,696Updated 3 weeks ago
neuralmagic / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆236Updated this week