kfish / micrograd-cpp-2023
A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library
☆14Updated last year
Alternatives and similar repositories for micrograd-cpp-2023
Users that are interested in micrograd-cpp-2023 are comparing it to the libraries listed below
Sorting:
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆52Updated last month
- LLM training in simple, raw C/CUDA☆95Updated last year
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- MLIR-based toolkit targeting intel heterogeneous hardware☆41Updated 2 months ago
- SYCL Reference Manual☆27Updated last year
- A minimal (really) out-of-tree MLIR example☆44Updated this week
- Source code for 'Modern Parallel Programming with C++ and Assembly' by Dan Kusswurm☆64Updated 3 years ago
- GPUOcelot: A dynamic compilation framework for PTX☆191Updated 3 months ago
- AMD’s C++ library for accelerating tensor primitives☆40Updated this week
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆127Updated 4 months ago
- Examples from Programming in Parallel with CUDA☆143Updated 2 years ago
- amdgpu example code in hip/asm☆32Updated last month
- A header only library implementing common mathematical functions using SIMD intrinsics☆105Updated 3 months ago
- GPU B-Tree with support for versioning (snapshots).☆47Updated 6 months ago
- rocWMMA☆111Updated this week
- ☆23Updated 3 years ago
- Header-only safetensors loader and saver in C++☆61Updated last week
- Task graph-based asynchronous programming system using C++ coroutine☆89Updated last year
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆133Updated 4 years ago
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆46Updated 6 months ago
- TPP experimentation on MLIR for linear algebra☆130Updated this week
- ☆56Updated last month
- ☆15Updated last week
- C implementation of the L-Mul f32/f16 multiplications from paper: https://arxiv.org/html/2410.00907☆27Updated 7 months ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆53Updated 5 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆92Updated last month
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 11 months ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆39Updated last week
- MLIR metal dialect☆26Updated 8 months ago
- A GLSL compiler targeting SPIR-V mlir☆20Updated 7 months ago