CoffeeBeforeArch / mmul
Serial and parallel implementations of matrix multiplication
☆40Updated 4 years ago
Alternatives and similar repositories for mmul
Users that are interested in mmul are comparing it to the libraries listed below
Sorting:
- ☆43Updated 4 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Updated last month
- ☆29Updated 5 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆62Updated last month
- ☆23Updated 3 years ago
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆16Updated 5 years ago
- CUDA Matrix Multiplication Optimization☆186Updated 9 months ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆131Updated 4 years ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆54Updated 2 weeks ago
- An extension library of WMMA API (Tensor Core API)☆96Updated 10 months ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 10 months ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆131Updated 4 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆81Updated last year
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆88Updated last year
- AMD’s C++ library for accelerating tensor primitives☆40Updated this week
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆22Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- LLVM/MLIR based compiler instrumentation of AMD GPU kernels☆18Updated 2 weeks ago
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆33Updated 2 months ago
- Benchmark for measuring the performance of sparse and irregular memory access.☆76Updated last week
- Learn OpenMP examples step by step☆93Updated 3 months ago
- ☆50Updated last year
- ☆67Updated 11 years ago
- MagmaDNN: a simple deep learning framework in c++☆49Updated 4 years ago
- Fast Matrix Multiplication Implementation in C programming language. This matrix multiplication algorithm is similar to what Numpy uses t…☆33Updated 3 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆108Updated this week
- NVIDIA tools guide☆132Updated 4 months ago
- Advanced Profiling and Analytics for AMD Hardware☆154Updated this week
- ☆18Updated 5 years ago