kfish / micrograd-cpp-2023Links
A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library
☆14Updated 2 years ago
Alternatives and similar repositories for micrograd-cpp-2023
Users that are interested in micrograd-cpp-2023 are comparing it to the libraries listed below
Sorting:
- MLIR-based toolkit targeting intel heterogeneous hardware☆50Updated 9 months ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆56Updated 9 months ago
- Teaching Vectorization and SIMD using Intel Intrinsics in a Computer Organization and Architecture class☆16Updated 10 months ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- Easier, quicker command-line CUDA profiling☆37Updated last year
- Source code for 'Modern Parallel Programming with C++ and Assembly' by Dan Kusswurm☆71Updated 3 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆123Updated last month
- A minimal (really) out-of-tree MLIR example☆46Updated 4 months ago
- ☆23Updated 3 years ago
- ☆85Updated this week
- development repository for the open earth compiler☆81Updated 4 years ago
- LLM training in simple, raw C/CUDA☆108Updated last year
- A lightweight memory allocator for hardware-accelerated machine learning☆176Updated 2 months ago
- SYCL Conformance Tests☆70Updated this week
- 🚧 A work-in-progress GLSL compiler targeting SPIR-V mlir 🚧☆22Updated last year
- ☆59Updated this week
- ☆17Updated last year
- 🎃 GPU load-balancing library for regular and irregular computations.☆63Updated 3 months ago
- High-Performance SGEMM on CUDA devices☆113Updated 10 months ago
- Little OpenMP Library☆169Updated 3 years ago
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆113Updated 4 months ago
- amdgpu example code in hip/asm☆46Updated this week
- TPP experimentation on MLIR for linear algebra☆140Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆138Updated last week
- SYCL Reference Manual☆28Updated last year
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆158Updated 3 years ago
- Tenstorrent MLIR compiler☆218Updated last week
- Super fast FP32 matrix multiplication on RDNA3☆81Updated 8 months ago
- Examples from Programming in Parallel with CUDA☆167Updated 2 years ago
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆60Updated last year