slaren / llama.cpp

Port of Facebook's LLaMA model in C/C++

☆10

Alternatives and similar repositories for llama.cpp:

Users that are interested in llama.cpp are comparing it to the libraries listed below

catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
futo-org / whisper-acft
☆124Updated 10 months ago
keeeeenw / TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
☆11Updated last year
mmwillet / TTS.cpp
TTS support with GGML
☆32Updated this week
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆70Updated 6 months ago
hscspring / llama.np
Inference Llama/Llama2/Llama3 Modes in NumPy
☆20Updated last year
kroggen / mamba.c
Inference of Mamba models in pure C
☆188Updated last year
casper-hansen / AutoAWQ_kernels
☆72Updated 5 months ago
deepgrove-ai / Bonsai
☆20Updated last month
MollySophia / rwkv-mobile
Inference RWKV with multiple supported backends.
☆43Updated this week
Ce-daros / ai-bash-response
☆11Updated last year
jameswdelancey / llama3.c
A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…
☆126Updated 9 months ago
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆56Updated last year
The-Swarm-Corporation / Mamba-R1
Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…
☆20Updated 2 weeks ago
RWKV / RWKV-wiki
RWKV centralised docs for the community
☆24Updated last month
ggerganov / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆26Updated last year
StudyingLover / ggml-tutorial
☆30Updated 8 months ago
argosopentech / MetalTranslate
Customizable machine translation in C++
☆51Updated last year
cahya-wirawan / rwkv-tokenizer
A fast RWKV Tokenizer written in Rust
☆45Updated last month
MollySophia / rwkv-qualcomm
Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK
☆64Updated this week
deepshard / mixtral-8x7b-Inference
Eh, simple and works.
☆27Updated last year
FL33TW00D / coremlprofiler
Profile your CoreML models directly from Python 🐍
☆27Updated 6 months ago
wozeparrot / tinyrwkv
tinygrad port of the RWKV large language model.
☆44Updated 2 months ago
cwhy / rwkv-decon
Trying to deconstruct RWKV in understandable terms
☆14Updated 2 years ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆52Updated last year
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆137Updated this week
cjpais / whisperfile
☆54Updated 8 months ago
andrewkchan / deepseek.cpp
CPU inference for the DeepSeek family of large language models in C++
☆291Updated this week
marty1885 / llama.cpp
My develoopment fork of llama.cpp. For now working on RK3588 NPU and Tenstorrent backend
☆91Updated 2 weeks ago
daquexian / faster-rwkv
☆124Updated last year