microsoft / ArchProbe
A profiler to disclose and quantify hardware features on GPUs.
☆168Updated 2 years ago
Alternatives and similar repositories for ArchProbe:
Users that are interested in ArchProbe are comparing it to the libraries listed below
- A micro Vulkan compute pipeline and a collection of benchmarking compute shaders☆234Updated 7 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆129Updated last year
- An extension library of WMMA API (Tensor Core API)☆91Updated 8 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- ☆138Updated this week
- Dissecting NVIDIA GPU Architecture☆90Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- ☆91Updated 11 months ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Stretching GPU performance for GEMMs and tensor contractions.☆233Updated last week
- TPP experimentation on MLIR for linear algebra☆121Updated last week
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆116Updated 2 years ago
- ☆61Updated 3 months ago
- ☆237Updated last month
- Training material for Nsight developer tools☆151Updated 7 months ago
- rocWMMA☆104Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆314Updated this week
- Demonstration of various hardware effects on CUDA GPUs.☆365Updated last year
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆80Updated last year
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 10 years ago
- ☆43Updated 4 years ago
- Sample benchmark demonstrating the VK_KHR_cooperative_matrix extension☆86Updated 2 months ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆130Updated this week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆359Updated last week
- IREE's PyTorch Frontend, based on Torch Dynamo.☆72Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated 3 weeks ago
- AMD's graph optimization engine.☆213Updated this week
- Shared Middle-Layer for Triton Compilation☆232Updated last week
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆367Updated 6 months ago
- LLVM-Based Pipeline Compiler☆175Updated last week