libxsmm / libxsmm-dnn
Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)
☆12Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for libxsmm-dnn
- A Benchmark Suite for Heterogeneous System Computation☆52Updated 2 weeks ago
- GPU Performance Advisor☆63Updated 2 years ago
- SYCL Reference Manual☆25Updated 6 months ago
- This package includes the implementation for Sparse-Matrix-Vector-Multiplication (SpMV) and Sparse-Matrix-Matrix-Multiplication (SpMM) fo…☆10Updated 4 years ago
- ☆11Updated 3 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆35Updated this week
- MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com☆38Updated 11 months ago
- ☆47Updated 5 years ago
- Compute applications.☆25Updated 4 years ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆74Updated this week
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated last month
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆65Updated 10 months ago
- ☆59Updated this week
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated last month
- Chai☆42Updated 11 months ago
- ☆40Updated 3 years ago
- ☆17Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- cuASR: CUDA Algebra for Semirings☆34Updated 2 years ago
- HCC Sample Applications☆13Updated 7 years ago
- ☆29Updated 2 years ago
- PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolution…☆18Updated last month
- Bandwidth test for ROCm☆47Updated this week
- Kernel Tuning Toolkit☆55Updated last week
- SYCL Benchmark Suite☆56Updated 2 months ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated last month
- An HPL-AI implementation for Fugaku☆19Updated 3 years ago
- Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.☆17Updated 5 years ago