mkauers / matrix-multiplicationLinks
Matrix multiplication schemes
☆207Updated last week
Alternatives and similar repositories for matrix-multiplication
Users that are interested in matrix-multiplication are comparing it to the libraries listed below
Sorting:
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Visualization of cache-optimized matrix multiplication☆157Updated 10 months ago
- Custom PTX Instruction Benchmark☆138Updated 11 months ago
- parallelized hyperdimensional tictactoe☆126Updated last year
- Quantum computing without the linear algebra☆78Updated 2 months ago
- The Quasi Quantum Assembly Programming Language☆36Updated 2 months ago
- tiny code to access tenstorrent blackhole☆61Updated 8 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆377Updated 9 months ago
- Learn GPU Programming in Mojo🔥 by Solving Puzzles☆288Updated last week
- Meta-GPU lesson covering general aspects of GPU programming as well as specific frameworks☆98Updated last month
- Learning about CUDA by writing PTX code.☆152Updated last year
- Exocompilation for productive programming of hardware accelerators☆708Updated this week
- Nvidia Instruction Set Specification Generator☆311Updated last year
- A package for defining deep learning models using categorical algebraic expressions.☆61Updated last year
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- RDNA3 emulator☆55Updated 9 months ago
- Competitive GPU kernel optimization platform.☆153Updated this week
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆182Updated last year
- LLM training in simple, raw C/CUDA☆112Updated last year
- Tensor library with autograd using only Rust's standard library☆71Updated last year
- Quantized LLM training in pure CUDA/C++.☆238Updated 3 weeks ago
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆185Updated this week
- ☆38Updated 2 years ago
- A collection of optimization problems in mathematics☆194Updated last week
- Train neural networks that distill into logic circuits, using JAX☆64Updated 8 months ago
- Super fast FP32 matrix multiplication on RDNA3☆82Updated 10 months ago
- The Cosmos numerical relativity code (with unstructured AMR)☆20Updated last year
- A Learning Journey: Micrograd in Mojo 🔥☆65Updated last year
- A massively parallel, optimal functional runtime in Rust☆31Updated last year
- Experimental GPU language with meta-programming☆25Updated last year