elinx / ugrad
A C++ implementation of the scalar-valued autograd engine micrograd
☆23Updated 4 years ago
Alternatives and similar repositories for ugrad:
Users that are interested in ugrad are comparing it to the libraries listed below
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆14Updated last year
- A recurrent (LSTM) neural network in C☆93Updated 3 years ago
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- TinyFive is a lightweight RISC-V emulator and assembler written in Python with neural network examples☆59Updated last year
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆125Updated 9 months ago
- Serial and parallel implementations of matrix multiplication☆40Updated 4 years ago
- CUDA Matrix Multiplication Optimization☆181Updated 9 months ago
- Neural network from scratch in CUDA/C++☆78Updated 3 months ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆103Updated 2 months ago
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆27Updated 8 months ago
- MLIR based Tiny Graph Compiler [dev-stage]☆17Updated 5 months ago
- CUDA for MNIST training/inference☆40Updated last year
- Can I make an *optimizing* compiler under 1k lines of code?☆56Updated 2 months ago
- Learning about CUDA by writing PTX code.☆128Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆348Updated this week
- Scalar-valued Automatic Differentiation library in C☆51Updated last year
- Class of High Performance Computing taken at U.T.P 2017☆55Updated 7 years ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆150Updated 10 months ago
- pytorch from scratch in pure C/CUDA and python☆40Updated 6 months ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆53Updated 4 months ago
- Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.☆23Updated 3 years ago
- Fast CUDA matrix multiplication from scratch☆697Updated last year
- Pure C ONNX runtime with zero dependancies for embedded devices☆204Updated last year
- ☆17Updated 11 months ago
- My C++ deep learning framework & other machine learning algorithms☆87Updated last year
- TPP experimentation on MLIR for linear algebra☆127Updated last week
- Tenstorrent MLIR compiler☆120Updated this week
- Reference Kernels for the Leaderboard☆33Updated last week
- A minimal (really) out-of-tree MLIR example☆44Updated last week
- Attention in SRAM on Tenstorrent Grayskull☆34Updated 9 months ago