libxsmm / libxsmm-dnn
Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)
☆12Updated last month
Alternatives and similar repositories for libxsmm-dnn:
Users that are interested in libxsmm-dnn are comparing it to the libraries listed below
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated last week
- ☆40Updated this week
- ☆33Updated 2 years ago
- CUDA Templates for Linear Algebra Subroutines☆16Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last week
- A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code☆14Updated 2 years ago
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Updated 11 months ago
- ☆17Updated 2 years ago
- A Benchmark Suite for Heterogeneous System Computation☆53Updated last month
- Data-Centric MLIR dialect☆40Updated last year
- ☆17Updated last year
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 5 months ago
- ☆43Updated 4 years ago
- GPU Performance Advisor☆64Updated 2 years ago
- ☆51Updated 5 years ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆79Updated this week
- GPTPU for SC 2021☆51Updated 2 years ago
- ROCm SPARSE marshalling library☆67Updated this week
- development repository for the open earth compiler☆79Updated 4 years ago
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆18Updated last year
- ☆61Updated 3 months ago
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- AMD optimized Sparse Linear Algebra library☆28Updated last week
- Torch Frontend for IREE☆25Updated last year
- ☆43Updated 4 years ago
- ☆53Updated 5 years ago
- Benchmark for measuring the performance of sparse and irregular memory access.☆77Updated last month
- Bandwidth test for ROCm☆54Updated 2 weeks ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆37Updated 3 years ago