north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆37Updated 7 months ago
Alternatives and similar repositories for tensor-cores-numerical-behavior:
Users that are interested in tensor-cores-numerical-behavior are comparing it to the libraries listed below
- ☆91Updated 11 months ago
- Dissecting NVIDIA GPU Architecture☆90Updated 2 years ago
- ☆17Updated 5 years ago
- An extension library of WMMA API (Tensor Core API)☆91Updated 8 months ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆126Updated 4 years ago
- ☆48Updated 5 years ago
- ☆39Updated 5 years ago
- ☆138Updated this week
- ☆61Updated 3 months ago
- CUDA Templates for Linear Algebra Subroutines☆16Updated this week
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆135Updated last year
- rocWMMA☆102Updated this week
- ☆43Updated 4 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆79Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing. By pro…☆68Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated this week
- ☆25Updated this week
- ☆87Updated last week
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆50Updated last year
- ☆49Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆100Updated 8 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- ☆73Updated 4 months ago
- A Winograd Minimal Filter Implementation in CUDA☆24Updated 3 years ago
- GPU Performance Advisor☆64Updated 2 years ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Updated 8 months ago
- CUDA Matrix Multiplication Optimization☆173Updated 8 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆76Updated last year
- OpenAI Triton backend for Intel® GPUs☆169Updated this week