zhangpiu / llm.cppLinks

LLM training in simple, C++/CUDA(with Eigen3)

☆14

Alternatives and similar repositories for llm.cpp

Users that are interested in llm.cpp are comparing it to the libraries listed below

Sorting:

salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆94Updated 4 months ago
gpu-mode / reference-kernels
Reference Kernels for the Leaderboard
☆55Updated this week
hscspring / llama.np
Inference Llama/Llama2/Llama3 Modes in NumPy
☆21Updated last year
GaoYusong / llm.cpp
A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.
☆33Updated 10 months ago
mscheong01 / speculative_decoding.c
minimal C implementation of speculative decoding based on llama2.c
☆22Updated 10 months ago
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆311Updated this week
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆99Updated last year
SzymonOzog / GPU_Programming
☆55Updated this week
Deep-Learning-Profiling-Tools / triton-samples
☆13Updated 3 months ago
justinchuby / onnx-safetensors
Use safetensors with ONNX 🤗
☆61Updated 3 months ago
evintunador / triton_docs_tutorials
making the official triton tutorials actually comprehensible
☆36Updated 2 months ago
cchan / tccl
extensible collectives library in triton
☆87Updated 2 months ago
BBuf / tensorrt-llm-moe
☆29Updated 4 months ago
ppl-ai / pplx-kernels
Perplexity GPU Kernels
☆331Updated this week
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆251Updated 7 months ago
triton-lang / kernels
☆80Updated 7 months ago
huggingface / kernels
Load compute kernels from the Hub
☆144Updated this week
HazyResearch / Megakernels
kernels, of the mega variety
☆329Updated this week
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆64Updated last week
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆109Updated 10 months ago
tlc-pack / libflash_attn
Standalone Flash Attention v2 kernel without libtorch dependency
☆110Updated 8 months ago
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
okuvshynov / llama_duo
asynchronous/distributed speculative evaluation for llama3
☆39Updated 10 months ago
jameswdelancey / llama3.c
A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…
☆127Updated 10 months ago
tspeterkim / mixed-precision-from-scratch
Mixed precision training from scratch with Tensors and CUDA
☆23Updated last year
pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆42Updated 2 months ago
gau-nernst / quantized-training
Explore training for quantized models
☆18Updated last week
abetlen / ggml-python
Python bindings for ggml
☆141Updated 9 months ago
InternLM / turbomind
☆85Updated 2 months ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆183Updated last month