kfish / micrograd-cpp-2023Links
A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library
☆14Updated 2 years ago
Alternatives and similar repositories for micrograd-cpp-2023
Users that are interested in micrograd-cpp-2023 are comparing it to the libraries listed below
Sorting:
- Easier, quicker command-line CUDA profiling☆44Updated last year
- Header-only safetensors loader and saver in C++☆76Updated last month
- ☆23Updated 3 years ago
- Little OpenMP Library☆170Updated 3 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆57Updated 10 months ago
- Embedded Universal DSL: a good DSL for us, by us☆66Updated this week
- 🎃 GPU load-balancing library for regular and irregular computations.☆66Updated 4 months ago
- LLM training in simple, raw C/CUDA☆112Updated last year
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆163Updated 3 years ago
- A minimal (really) out-of-tree MLIR example☆46Updated 5 months ago
- MLIR-based toolkit targeting intel heterogeneous hardware☆51Updated 11 months ago
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆13Updated 2 years ago
- AMD’s C++ library for accelerating tensor primitives☆48Updated last week
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆376Updated 9 months ago
- ☆13Updated this week
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆48Updated last year
- Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda☆97Updated last month
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆49Updated 4 years ago
- ☆18Updated last year
- MLIR metal dialect☆36Updated last year
- Monorepo for the OpenCilk compiler. Forked from llvm/llvm-project and based on Tapir/LLVM.☆119Updated last week
- 🚧 A work-in-progress GLSL compiler targeting SPIR-V mlir 🚧☆22Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Updated 10 months ago
- development repository for the open earth compiler☆81Updated 4 years ago
- A lightweight memory allocator for hardware-accelerated machine learning☆180Updated 4 months ago
- TPP experimentation on MLIR for linear algebra☆142Updated 2 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆32Updated this week
- A simple and fast minimalistic header-only library allowing to run async tasks and execute task graphs.☆61Updated last year