elinx / ugradLinks
A C++ implementation of the scalar-valued autograd engine micrograd
☆23Updated 5 years ago
Alternatives and similar repositories for ugrad
Users that are interested in ugrad are comparing it to the libraries listed below
Sorting:
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆370Updated 7 months ago
- LLM training in simple, raw C/CUDA☆108Updated last year
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆39Updated last year
- Neural network from scratch in CUDA/C++☆87Updated 3 months ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆114Updated 3 months ago
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆141Updated 2 months ago
- A recurrent (LSTM) neural network in C☆95Updated 3 years ago
- Learning about CUDA by writing PTX code.☆150Updated last year
- High-Performance SGEMM on CUDA devices☆113Updated 10 months ago
- Custom PTX Instruction Benchmark☆136Updated 9 months ago
- ☆86Updated last month
- Some CUDA example code with READMEs.☆179Updated last month
- Serial and parallel implementations of matrix multiplication☆44Updated 4 years ago
- NVIDIA tools guide☆150Updated 11 months ago
- MLIR based Tiny Graph Compiler [dev-stage]☆20Updated last year
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆174Updated 11 months ago
- An Open Convolutional Neural Network Framework in C++ From Scratch☆66Updated 4 years ago
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆14Updated 2 years ago
- A c/c++ implementation of micrograd: a tiny autograd engine with neural net on top.☆77Updated 2 years ago
- ☆27Updated 9 months ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆60Updated last year
- ☆136Updated 2 years ago
- Step by step implementation of a fast softmax kernel in CUDA☆59Updated 11 months ago
- Pure C ONNX runtime with zero dependancies for embedded devices☆213Updated 2 years ago
- CUDA Matrix Multiplication Optimization☆245Updated last year
- Neural Network framework using Backpropogation in C☆78Updated 3 years ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆302Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated 10 months ago
- Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.☆26Updated 3 years ago
- Scalar-valued Automatic Differentiation library in C☆53Updated 2 years ago