elinx / ugrad
A C++ implementation of the scalar-valued autograd engine micrograd
☆23Updated 5 years ago
Alternatives and similar repositories for ugrad
Users that are interested in ugrad are comparing it to the libraries listed below
Sorting:
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆150Updated 11 months ago
- A c/c++ implementation of micrograd: a tiny autograd engine with neural net on top.☆67Updated last year
- Implementation of convolution layer in different flavors☆68Updated 7 years ago
- CUDA Matrix Multiplication Optimization☆186Updated 9 months ago
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆14Updated last year
- Clover: Quantized 4-bit Linear Algebra Library☆113Updated 6 years ago
- An Open Convolutional Neural Network Framework in C++ From Scratch☆64Updated 4 years ago
- Neural network from scratch in CUDA/C++☆80Updated 4 months ago
- C implementation of the L-Mul f32/f16 multiplications from paper: https://arxiv.org/html/2410.00907☆27Updated 7 months ago
- Serial and parallel implementations of matrix multiplication☆40Updated 4 years ago
- NNCG: A Neural Network Code Generator☆35Updated 9 months ago
- LLM training in simple, raw C/CUDA☆95Updated last year
- CUTLASS and CuTe Examples☆49Updated 4 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆350Updated 3 weeks ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆105Updated 3 months ago
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆126Updated 9 months ago
- ☆45Updated 6 years ago
- My C++ deep learning framework & other machine learning algorithms☆87Updated last year
- Learning about CUDA by writing PTX code.☆129Updated last year
- MLIR based Tiny Graph Compiler [dev-stage]☆18Updated 5 months ago
- Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.☆23Updated 3 years ago
- ☆50Updated last year
- Convert ONNX models to plain C++ code (without dependencies)☆20Updated 2 years ago
- GPT2 implementation in C++ using Ort☆26Updated 4 years ago
- A language and compiler for irregular tensor programs.☆138Updated 5 months ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆61Updated 8 months ago
- Source code for 'Modern Parallel Programming with C++ and Assembly' by Dan Kusswurm☆64Updated 3 years ago
- Open Neural Network Exchange model parser in C☆16Updated 5 years ago
- ☆17Updated last year
- ☆102Updated 2 months ago