jart / llama.cppLinks

Port of Facebook's LLaMA model in C/C++

☆23

Alternatives and similar repositories for llama.cpp

Users that are interested in llama.cpp are comparing it to the libraries listed below

Sorting:

nomic-ai / kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …
☆51Updated 11 months ago
npk48 / rwkv_cuda
☆11Updated 2 years ago
dmatora / LLM-inference-speed-benchmarks
☆21Updated last year
MaggotHATE / Llama_chat
A chat UI for Llama.cpp
☆15Updated 2 months ago
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆112Updated last year
coldlarry / llama2.cpp
Inference Llama 2 in one file of pure C
☆12Updated 2 years ago
leloykun / llama2.cpp
Inference Llama 2 in one file of pure C++
☆87Updated 2 years ago
philpax / ggml
Tensor library for machine learning
☆21Updated 2 years ago
PABannier / biogpt.cpp
Port of Microsoft's BioGPT in C/C++ using ggml
☆85Updated last year
fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆153Updated last year
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆55Updated last year
janhq / cortex.llamacpp
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…
☆42Updated 7 months ago
openvinotoolkit / openvino_tokenizers
OpenVINO Tokenizers extension
☆48Updated last week
determined-ai / determined-examples
Example ML projects that use the Determined library.
☆32Updated last year
onnx / steering-committee
Notes and artifacts from the ONNX steering committee
☆28Updated last week
abetlen / ggml-python
Python bindings for ggml
☆147Updated last year
GaoYusong / llm.cpp
A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.
☆42Updated last year
gf712 / gpt2-cpp
GPT2 implementation in C++ using Ort
☆26Updated 5 years ago
deaneeth / tinygpu
A lightweight Python-based GPU architecture simulator that demonstrates how parallel threads, registers, memory, and instructions work on…
☆40Updated 2 weeks ago
casper-hansen / AutoAWQ_kernels
☆79Updated last year
janhq / cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…
☆42Updated last year
rahuldshetty / starcoder.js
Web browser version of StarCoder.cpp
☆46Updated 2 years ago
xyzhang626 / embeddings.cpp
ggml implementation of embedding models including SentenceTransformer and BGE
☆63Updated 2 years ago
kroggen / mamba.c
Inference of Mamba and Mamba2 models in pure C
☆196Updated 2 weeks ago
ggerganov / ggterm
Terminal configuration for C++ development with Vim
☆72Updated last week
sambanova / tutorials
☆13Updated last year
HabanaAI / SynapseAI_Core
SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi
☆42Updated last year
huggingface / optimum-amd
AMD related optimizations for transformer models
☆97Updated 3 months ago
facebookresearch / FAMBench
Benchmarks to capture important workloads.
☆32Updated 2 weeks ago
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆58Updated last year