zhangpiu / llm.cppLinks
LLM training in simple, C++/CUDA(with Eigen3)
☆14Updated 9 months ago
Alternatives and similar repositories for llm.cpp
Users that are interested in llm.cpp are comparing it to the libraries listed below
Sorting:
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- Reference Kernels for the Leaderboard☆55Updated this week
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Updated last year
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆33Updated 10 months ago
- minimal C implementation of speculative decoding based on llama2.c☆22Updated 10 months ago
- Fast low-bit matmul kernels in Triton☆311Updated this week
- LLM training in simple, raw C/CUDA☆99Updated last year
- ☆55Updated this week
- ☆13Updated 3 months ago
- Use safetensors with ONNX 🤗☆61Updated 3 months ago
- making the official triton tutorials actually comprehensible☆36Updated 2 months ago
- extensible collectives library in triton☆87Updated 2 months ago
- ☆29Updated 4 months ago
- Perplexity GPU Kernels☆331Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆251Updated 7 months ago
- ☆80Updated 7 months ago
- Load compute kernels from the Hub☆144Updated this week
- kernels, of the mega variety☆329Updated this week
- Ahead of Time (AOT) Triton Math Library☆64Updated last week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆109Updated 10 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆110Updated 8 months ago
- Experiments with BitNet inference on CPU☆54Updated last year
- asynchronous/distributed speculative evaluation for llama3☆39Updated 10 months ago
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆127Updated 10 months ago
- Mixed precision training from scratch with Tensors and CUDA☆23Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆42Updated 2 months ago
- Explore training for quantized models☆18Updated last week
- Python bindings for ggml☆141Updated 9 months ago
- ☆85Updated 2 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆183Updated last month