hjc4869 / llama.cpp

LLM inference in C/C++

☆13

Alternatives and similar repositories for llama.cpp:

Users that are interested in llama.cpp are comparing it to the libraries listed below

leafspark / AutoGGUF
automatically quant GGUF models
☆164Updated last week
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆55Updated last month
unslothai / llama.cpp
LLM inference in C/C++
☆67Updated this week
arlo-phoenix / bitsandbytes-rocm-5.6
8-bit CUDA functions for PyTorch Rocm compatible
☆39Updated last year
ThomasBaruzier / gddr6-core-junction-vram-temps
Core, Junction, and VRAM temperature reader for Linux + GDDR6/GDDR6X GPUs
☆37Updated 3 months ago
LostRuins / datasetexplorer
Easily view and modify JSON datasets for large language models
☆71Updated 3 weeks ago
adrienbrault / hf-gguf-to-ollama
Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.
☆115Updated 10 months ago
theroyallab / YALS
☆52Updated this week
tdrussell / qlora-pipe
A pipeline parallel training script for LLMs.
☆136Updated last week
wdlctc / headinfer
☆40Updated this week
ROCm / bitsandbytes
8-bit CUDA functions for PyTorch
☆45Updated last month
and270 / thinking_effort_processor
☆81Updated 2 weeks ago
gpustack / llama-box
LM inference server implementation based on *.cpp.
☆154Updated this week
Nexesenex / croco.cpp
Croco.Cpp is a 3rd party testground for KoboldCPP, a simple one-file way to run various GGML/GGUF models with KoboldAI's UI. (for Croco.C…
☆100Updated this week
crashr / gppm
GPU Power and Performance Manager
☆57Updated 5 months ago
fidecastro / llama-cpp-connector
Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!
☆22Updated this week
akx / ggify
Tool to download models from Huggingface Hub and convert them to GGML/GGUF for llama.cpp
☆127Updated 6 months ago
stduhpf / stable-diffusion.cpp
Stable Diffusion and Flux in pure C/C++
☆13Updated this week
nktice / AMD-AI
AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1
☆200Updated last month
chigkim / Ollama-MMLU-Pro
☆83Updated 3 months ago
SearchSavior / OpenArc
Lightweight Inference server for OpenVINO
☆143Updated this week
ROCm / flash-attention
Fast and memory-efficient exact attention
☆163Updated this week
chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆33Updated 11 months ago
S95Sedan / Deepspeed-Windows
Deepspeed windows information
☆37Updated last year
astramind-ai / Pulsar
The hearth of The Pulsar App, fast, secure and shared inference with modern UI
☆56Updated 3 months ago
statchamber / ebook-to-chatml-conversion
idea: https://github.com/nyxkrage/ebook-groupchat/
☆86Updated 7 months ago
NebuLlamaUI / NebuLlamaUI
An interface that features barely zero external dependencies beyond the Ollama API itself, making it lightweight and portable to easily i…
☆12Updated this week
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆222Updated this week
remichu-ai / gallamaUI
☆21Updated 5 months ago
kalomaze / koboldcpp
My personal fork of koboldcpp where I hack in experimental samplers.
☆44Updated 10 months ago