north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆37Updated 6 months ago
Alternatives and similar repositories for tensor-cores-numerical-behavior:
Users that are interested in tensor-cores-numerical-behavior are comparing it to the libraries listed below
- ☆87Updated 10 months ago
- An extension library of WMMA API (Tensor Core API)☆88Updated 7 months ago
- ☆17Updated 5 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆73Updated last year
- Dissecting NVIDIA GPU Architecture☆88Updated 2 years ago
- ☆72Updated 2 months ago
- ☆47Updated 5 years ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆105Updated 2 months ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆134Updated last year
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆124Updated 4 years ago
- ☆42Updated 4 years ago
- ☆39Updated 4 years ago
- GPU Performance Advisor☆64Updated 2 years ago
- ☆17Updated this week
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆85Updated 2 years ago
- ☆67Updated 3 months ago
- rocWMMA☆100Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 9 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆97Updated 7 months ago
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆56Updated this week
- ☆60Updated 2 months ago
- GVProf: A Value Profiler for GPU-based Clusters☆49Updated 10 months ago
- A Data-Centric Compiler for Machine Learning☆82Updated last year
- ☆14Updated 2 years ago
- collection of benchmarks to measure basic GPU capabilities☆296Updated last week
- A Winograd Minimal Filter Implementation in CUDA☆24Updated 3 years ago
- ☆137Updated this week
- Benchmark code for the "Online normalizer calculation for softmax" paper☆66Updated 6 years ago
- ☆48Updated 11 months ago
- RCCL Performance Benchmark Tests☆59Updated last month