elinx / ugradLinks
A C++ implementation of the scalar-valued autograd engine micrograd
☆23Updated 5 years ago
Alternatives and similar repositories for ugrad
Users that are interested in ugrad are comparing it to the libraries listed below
Sorting:
- MLIR based Tiny Graph Compiler [dev-stage]☆20Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆367Updated 7 months ago
- LLM training in simple, raw C/CUDA☆108Updated last year
- Neural network from scratch in CUDA/C++☆87Updated 2 months ago
- Serial and parallel implementations of matrix multiplication☆44Updated 4 years ago
- Learn OpenCL step by step.☆136Updated 3 years ago
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆40Updated last year
- A c/c++ implementation of micrograd: a tiny autograd engine with neural net on top.☆76Updated 2 years ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆114Updated 2 months ago
- Pure C ONNX runtime with zero dependancies for embedded devices☆213Updated 2 years ago
- CUDA Matrix Multiplication Optimization☆239Updated last year
- Teaching Vectorization and SIMD using Intel Intrinsics in a Computer Organization and Architecture class☆16Updated 9 months ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆255Updated last year
- A neural network implementation for the MNIST dataset, written in plain C☆100Updated 4 years ago
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆169Updated 10 months ago
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆13Updated 2 years ago
- A single header-only C++ library for automatic / algorithmic differentiation.☆16Updated 3 years ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆59Updated last year
- ☆116Updated 2 years ago
- Attention in SRAM on Tenstorrent Grayskull☆39Updated last year
- C++ demo of deep neural networks (MLP, CNN)☆31Updated last year
- IREE's PyTorch Frontend, based on Torch Dynamo.☆101Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆216Updated 9 months ago
- Learn OpenMP examples step by step☆101Updated 10 months ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆159Updated this week
- Step by step implementation of a fast softmax kernel in CUDA☆55Updated 10 months ago
- ☆17Updated last year
- My C++ deep learning framework & other machine learning algorithms☆88Updated 2 years ago
- A recurrent (LSTM) neural network in C☆95Updated 3 years ago
- Swin Transformer C++ Implementation☆64Updated 4 years ago