enp1s0 / ozIMMU
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
☆56Updated last month
Alternatives and similar repositories for ozIMMU:
Users that are interested in ozIMMU are comparing it to the libraries listed below
- An extension library of WMMA API (Tensor Core API)☆91Updated 8 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing. By pro…☆68Updated this week
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆30Updated 3 months ago
- ☆25Updated this week
- AMD’s C++ library for accelerating tensor primitives☆39Updated this week
- ☆15Updated 5 months ago
- ☆17Updated 5 years ago
- This is a demo how to write a high performance convolution run on apple silicon☆54Updated 3 years ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆51Updated 3 weeks ago
- ☆49Updated last year
- study of cutlass☆21Updated 4 months ago
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- GPU Performance Advisor☆64Updated 2 years ago
- rocWMMA☆102Updated this week
- Next generation library for iterative sparse solvers for ROCm platform☆78Updated this week
- Standalone Flash Attention v2 kernel without libtorch dependency☆106Updated 6 months ago
- ☆39Updated 5 years ago
- ☆91Updated 11 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆59Updated 2 weeks ago
- Bandwidth test for ROCm☆54Updated last week
- Dissecting NVIDIA GPU Architecture☆90Updated 2 years ago
- CUDA Templates for Linear Algebra Subroutines☆16Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆128Updated last year
- Benchmark tests supporting the TiledCUDA library.☆15Updated 4 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 5 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 7 months ago
- ☆43Updated 4 years ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated this week
- CUDA Template Functions☆19Updated 3 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆43Updated 2 weeks ago