kfish / micrograd-cpp-2023Links
A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library
☆14Updated last year
Alternatives and similar repositories for micrograd-cpp-2023
Users that are interested in micrograd-cpp-2023 are comparing it to the libraries listed below
Sorting:
- Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sort…☆15Updated last year
- ☆17Updated last year
- SYCL Reference Manual☆28Updated last year
- Header-only safetensors loader and saver in C++☆62Updated 3 weeks ago
- MLIR-based toolkit targeting intel heterogeneous hardware☆44Updated 3 months ago
- An extension library of WMMA API (Tensor Core API)☆97Updated 10 months ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆52Updated 2 months ago
- AMD’s C++ library for accelerating tensor primitives☆42Updated this week
- ☆19Updated 2 weeks ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆62Updated 9 months ago
- Teaching Vectorization and SIMD using Intel Intrinsics in a Computer Organization and Architecture class☆15Updated 3 months ago
- Serial and parallel implementations of matrix multiplication☆41Updated 4 years ago
- A practical way of learning Swizzle☆20Updated 4 months ago
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆67Updated 2 years ago
- TPP experimentation on MLIR for linear algebra☆131Updated last week
- A minimal (really) out-of-tree MLIR example☆44Updated 3 weeks ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆88Updated this week
- CUTLASS and CuTe Examples☆54Updated 5 months ago
- NVIDIA tools guide☆133Updated 5 months ago
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆127Updated 5 months ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆67Updated 4 years ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆110Updated 8 months ago
- amdgpu example code in hip/asm☆32Updated this week
- LLM training in simple, raw C/CUDA☆99Updated last year
- Retargetable ML compilers for the twenty-first century!☆13Updated last month
- rocWMMA☆114Updated last week
- ☆23Updated 3 years ago
- Machine Learning Compiler Road Map☆43Updated last year
- A C++ port of karpathy/llm.c features a tiny torch library while maintaining overall simplicity.☆33Updated 10 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆192Updated 3 months ago