mkauers / matrix-multiplicationLinks
Matrix multiplication schemes
☆202Updated 2 weeks ago
Alternatives and similar repositories for matrix-multiplication
Users that are interested in matrix-multiplication are comparing it to the libraries listed below
Sorting:
- Custom PTX Instruction Benchmark☆131Updated 8 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- The Quasi Quantum Assembly Programming Language☆36Updated last week
- Quantum computing without the linear algebra☆76Updated 4 months ago
- Learn GPU Programming in Mojo🔥 by Solving Puzzles☆195Updated last week
- Visualization of cache-optimized matrix multiplication☆155Updated 7 months ago
- Learning about CUDA by writing PTX code.☆146Updated last year
- Meta-GPU lesson covering general aspects of GPU programming as well as specific frameworks☆91Updated last week
- Exocompilation for productive programming of hardware accelerators☆678Updated last week
- RDNA3 emulator☆54Updated 6 months ago
- Nvidia Instruction Set Specification Generator☆297Updated last year
- Tensor library with autograd using only Rust's standard library☆70Updated last year
- parallelized hyperdimensional tictactoe☆125Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆367Updated 6 months ago
- A package for defining deep learning models using categorical algebraic expressions.☆61Updated last year
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆166Updated 10 months ago
- High-Performance SGEMM on CUDA devices☆109Updated 9 months ago
- Competitive GPU kernel optimization platform.☆113Updated last week
- Alex Krizhevsky's original code from Google Code☆198Updated 9 years ago
- ☆76Updated this week
- ☆106Updated 11 months ago
- tiny code to access tenstorrent blackhole☆60Updated 5 months ago
- LLM training in simple, raw C/CUDA☆107Updated last year
- ☆81Updated 2 weeks ago
- Quantized LLM training in pure CUDA/C++.☆214Updated this week
- HVM3☆274Updated last month
- a categorical deep learning compiler☆204Updated last month
- Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers☆152Updated 10 months ago
- A massively parallel, optimal functional runtime in Rust☆31Updated last year
- Train neural networks that distill into logic circuits, using JAX☆63Updated 5 months ago