microsoft / ArchProbe
A profiler to disclose and quantify hardware features on GPUs.
☆162Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for ArchProbe
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆124Updated last year
- Stretching GPU performance for GEMMs and tensor contractions.☆223Updated this week
- A micro Vulkan compute pipeline and a collection of benchmarking compute shaders☆227Updated 3 months ago
- ☆128Updated this week
- Conversion to/from half-precision floating point formats☆333Updated 3 months ago
- IREE's PyTorch Frontend, based on Torch Dynamo.☆55Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆124Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆201Updated 2 years ago
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- rocWMMA☆91Updated this week
- Shared Middle-Layer for Triton Compilation☆191Updated this week
- AMD's graph optimization engine.☆186Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆270Updated this week
- TPP experimentation on MLIR for linear algebra☆110Updated last month
- collection of benchmarks to measure basic GPU capabilities☆265Updated 5 months ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆95Updated last week
- ☆59Updated this week
- ☆80Updated 7 months ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆83Updated 9 months ago
- CUDA Matrix Multiplication Optimization☆141Updated 4 months ago
- Intercept Layer for Debugging and Analyzing OpenCL Applications☆314Updated 2 weeks ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆313Updated this week
- portDNN is a library implementing neural network algorithms written using SYCL☆108Updated 6 months ago
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆88Updated last week
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- Conversions to MLIR EmitC☆124Updated 2 months ago
- CUDA Kernel Benchmarking Library☆519Updated this week
- Sample benchmark demonstrating the VK_KHR_cooperative_matrix extension☆66Updated last month
- Demonstration of various hardware effects on CUDA GPUs.☆358Updated 11 months ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆109Updated 2 years ago