GaoYusong / llm.cpp

A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.

☆24

Alternatives and similar repositories for llm.cpp:

Users that are interested in llm.cpp are comparing it to the libraries listed below

gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆91Updated 9 months ago
dpuyda / scheduling
A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.
☆51Updated 2 months ago
syoyo / safetensors-cpp
Header-only safetensors loader and saver in C++
☆53Updated 2 months ago
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆120Updated this week
bd-iaas-us / InfiniStore
A distributed KV store for disaggregated LLM inference
☆31Updated this week
andrewkchan / yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
☆247Updated last month
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆76Updated last month
abhisheknair10 / llama3.cu
Lightweight Llama 3 8B Inference Engine in CUDA C
☆45Updated last week
apoorvnandan / lilgrad
pytorch from scratch in pure C/CUDA and python
☆40Updated 4 months ago
tlc-pack / libflash_attn
Standalone Flash Attention v2 kernel without libtorch dependency
☆104Updated 5 months ago
howardlau1999 / rdmapp
C++ interfaces for RDMA access
☆66Updated last month
leimao / Nsight-Compute-Docker-Image
Nsight Compute In Docker
☆11Updated last year
nitnelave / lru_cache
A C++ implementation of a LRU cache
☆38Updated 4 years ago
yusing / qalloc
A quick pool allocator for c++ with type info and gc support
☆2Updated 2 years ago
archibate / sycltutor
小彭老师推出 SyCL 2020 课程（施工中，日后会在直播中放出）
☆15Updated last year
DefTruth / hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆52Updated 2 weeks ago
sandeepkumar-skb / pytorch_custom_op
End to End steps for adding custom ops in PyTorch.
☆20Updated 4 years ago
yalue / cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
☆71Updated 6 months ago
vllm-project / flash-attention
Fast and memory-efficient exact attention
☆44Updated this week
dian-lun-lin / taro
Task graph-based asynchronous programming system using C++ coroutine
☆87Updated last year
chenyu-jiang / nsys2json
A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.
☆29Updated last month
ProjectPhysX / PTXprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆50Updated last year
zeux / calm
CUDA/Metal accelerated language model inference
☆512Updated 2 months ago
triton-inference-server / common
Common source, scripts and utilities shared across all Triton repositories.
☆68Updated last week
taskflow / tfprof
Profiling Taskflow Programs through Visualization
☆49Updated last year
ash-01xor / bpe.c
Simple Byte pair Encoding mechanism used for tokenization process . written purely in C
☆126Updated 3 months ago
microsoft / ark
A GPU-driven system framework for scalable AI applications
☆112Updated 2 weeks ago
roastduck / FreeTensor
A language and compiler for irregular tensor programs.
☆135Updated 2 months ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆161Updated 7 months ago
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆255Updated 10 months ago