elinx / ugradLinks
A C++ implementation of the scalar-valued autograd engine micrograd
☆23Updated 5 years ago
Alternatives and similar repositories for ugrad
Users that are interested in ugrad are comparing it to the libraries listed below
Sorting:
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆367Updated 6 months ago
- Neural network from scratch in CUDA/C++☆87Updated 2 months ago
- A header only library implementing common mathematical functions using SIMD intrinsics☆113Updated last month
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆38Updated last year
- An Open Convolutional Neural Network Framework in C++ From Scratch☆66Updated 4 years ago
- A recurrent (LSTM) neural network in C☆95Updated 3 years ago
- A c/c++ implementation of micrograd: a tiny autograd engine with neural net on top.☆73Updated 2 years ago
- LLM training in simple, raw C/CUDA☆107Updated last year
- Learn OpenMP examples step by step☆98Updated 9 months ago
- Learning about CUDA by writing PTX code.☆146Updated last year
- ☆114Updated 2 years ago
- Serial and parallel implementations of matrix multiplication☆44Updated 4 years ago
- ☆46Updated 7 years ago
- Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.☆26Updated 3 years ago
- High-Performance SGEMM on CUDA devices☆109Updated 9 months ago
- Teaching Vectorization and SIMD using Intel Intrinsics in a Computer Organization and Architecture class☆16Updated 8 months ago
- NVIDIA tools guide☆145Updated 10 months ago
- MLIR-based toolkit targeting intel heterogeneous hardware☆48Updated 8 months ago
- C++ demo of deep neural networks (MLP, CNN)☆31Updated last year
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆139Updated 2 weeks ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆56Updated 11 months ago
- GPT-2 in C☆76Updated 10 months ago
- CUDA Matrix Multiplication Optimization☆235Updated last year
- Inline PTX Assembly in CUDA example☆13Updated 3 years ago
- MLIR based Tiny Graph Compiler [dev-stage]☆20Updated 11 months ago
- Custom PTX Instruction Benchmark☆131Updated 8 months ago
- Open Neural Network Exchange to C compiler.☆330Updated last week
- Visualization of cache-optimized matrix multiplication☆155Updated 7 months ago
- Pure C ONNX runtime with zero dependancies for embedded devices☆212Updated 2 years ago
- Learn OpenCL step by step.☆135Updated 3 years ago