elinx / ugradLinks
A C++ implementation of the scalar-valued autograd engine micrograd
☆23Updated 5 years ago
Alternatives and similar repositories for ugrad
Users that are interested in ugrad are comparing it to the libraries listed below
Sorting:
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆33Updated 10 months ago
- Scalar-valued Automatic Differentiation library in C☆52Updated last year
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆14Updated last year
- A c/c++ implementation of micrograd: a tiny autograd engine with neural net on top.☆68Updated last year
- Flash Attention in raw Cuda C beating PyTorch☆23Updated last year
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆151Updated last year
- ☆45Updated 6 years ago
- NVIDIA tools guide☆133Updated 5 months ago
- Port of Karpathy's micrograd in pure C. Micrograd is a tiny scalar-valued autograd engine and a neural net library on top of it with PyTo…☆30Updated 10 months ago
- Learning about CUDA by writing PTX code.☆131Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆351Updated last month
- Clover: Quantized 4-bit Linear Algebra Library☆112Updated 7 years ago
- CUDA Guide☆66Updated last year
- Serial and parallel implementations of matrix multiplication☆41Updated 4 years ago
- Neural network from scratch in CUDA/C++☆80Updated 4 months ago
- CUDA Matrix Multiplication Optimization☆189Updated 10 months ago
- LLM training in simple, raw C/CUDA☆99Updated last year
- GPT2 implementation in C++ using Ort☆26Updated 4 years ago
- Can I make an *optimizing* compiler under 1k lines of code?☆59Updated 3 months ago
- ☆98Updated 2 years ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆107Updated 3 months ago
- A tree-walker && virtual-machine && JIT interpreter for Lox language☆29Updated last year
- ☆54Updated last week
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- A recurrent (LSTM) neural network in C☆94Updated 3 years ago
- asynchronous/distributed speculative evaluation for llama3☆39Updated 10 months ago
- C++ implementation of Lox interpreter (based on the book Crafting Interpreters by Bob Nystrom)☆33Updated 2 years ago
- My C++ deep learning framework & other machine learning algorithms☆87Updated last year
- NNCG: A Neural Network Code Generator☆35Updated 10 months ago
- A profiler to disclose and quantify hardware features on GPUs.☆170Updated 3 years ago