libxsmm / libxsmm-dnn
Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)
☆12Updated last week
Alternatives and similar repositories for libxsmm-dnn:
Users that are interested in libxsmm-dnn are comparing it to the libraries listed below
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆39Updated last week
- SYCL Reference Manual☆27Updated 11 months ago
- SYCL Benchmark Suite☆64Updated 2 months ago
- A Benchmark Suite for Heterogeneous System Computation☆53Updated 2 months ago
- ☆15Updated last week
- ☆41Updated this week
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated last month
- Bandwidth test for ROCm☆54Updated last week
- Data-Centric MLIR dialect☆40Updated last year
- ROCm SPARSE marshalling library☆67Updated this week
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 6 months ago
- CUDA Templates for Linear Algebra Subroutines☆20Updated this week
- ☆60Updated 4 months ago
- ☆14Updated 4 years ago
- ☆17Updated last year
- ☆51Updated 5 years ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated last week
- GPU Performance Advisor☆64Updated 2 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆82Updated last week
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆18Updated last year
- development repository for the open earth compiler☆80Updated 4 years ago
- ☆43Updated 4 years ago
- ☆46Updated this week
- AMD’s C++ library for accelerating tensor primitives☆39Updated this week
- Performance Prediction Toolkit☆51Updated 4 months ago
- MIOpenGEMM is now deprecated☆62Updated last year
- oneAPI Level Zero Conformance & Performance test content☆48Updated last week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- Yaksa: High-performance Noncontiguous Data Management☆13Updated 6 months ago
- A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code☆15Updated 2 years ago