microsoft / ArchProbe
A profiler to disclose and quantify hardware features on GPUs.
☆166Updated 2 years ago
Alternatives and similar repositories for ArchProbe:
Users that are interested in ArchProbe are comparing it to the libraries listed below
- ☆137Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆127Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆130Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆212Updated 3 years ago
- A micro Vulkan compute pipeline and a collection of benchmarking compute shaders☆231Updated 6 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆169Updated last week
- Stretching GPU performance for GEMMs and tensor contractions.☆233Updated this week
- An extension library of WMMA API (Tensor Core API)☆88Updated 7 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆426Updated last year
- amdgpu example code in hip/asm☆28Updated last week
- ☆87Updated 10 months ago
- ☆60Updated 2 months ago
- rocWMMA☆100Updated this week
- Training material for Nsight developer tools☆148Updated 6 months ago
- TPP experimentation on MLIR for linear algebra☆119Updated this week
- Dissecting NVIDIA GPU Architecture☆88Updated 2 years ago
- Intercept Layer for Debugging and Analyzing OpenCL Applications☆322Updated 2 weeks ago
- SYCL Open Source Specification☆127Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last year
- ☆233Updated last week
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆97Updated last week
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆217Updated this week
- MLIR-based partitioning system☆62Updated this week
- ☆42Updated 4 years ago
- ☆136Updated last month
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆74Updated last year
- Shared Middle-Layer for Triton Compilation☆226Updated this week
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 9 years ago
- ☆308Updated 2 months ago