libxsmm / libxsmm-dnn
Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)
☆12Updated 5 months ago
Alternatives and similar repositories for libxsmm-dnn:
Users that are interested in libxsmm-dnn are comparing it to the libraries listed below
- Compute applications.☆24Updated 5 years ago
- ☆12Updated 3 years ago
- A portable implementation of SZ lossy compression for AMD GPUs and Hygon DCUs.☆7Updated 3 weeks ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- A Benchmark Suite for Heterogeneous System Computation☆53Updated 2 months ago
- Simplified Interface to Complex Memory☆27Updated last year
- This package includes the implementation for Sparse-Matrix-Vector-Multiplication (SpMV) and Sparse-Matrix-Matrix-Multiplication (SpMM) fo…☆10Updated 4 years ago
- HCC Sample Applications☆13Updated 8 years ago
- ☆51Updated 5 years ago
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- ☆14Updated 4 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆31Updated 3 years ago
- amdgpu example code in hip/asm☆24Updated last week
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated 3 months ago
- GPU Performance Advisor☆63Updated 2 years ago
- AMD optimized Sparse Linear Algebra library☆26Updated 2 weeks ago
- A thin wrapper around miOpen and cuDNN☆40Updated last year
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆45Updated 3 years ago
- ☆40Updated 4 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated 3 months ago
- An HPL-AI implementation for Fugaku☆19Updated 3 years ago
- ☆59Updated last month
- Official BOLT Repository☆28Updated 5 months ago
- PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolution…☆18Updated last month
- Performance Prediction Toolkit☆51Updated last month
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- development repository for the open earth compiler☆80Updated 3 years ago
- A Top-Down Profiler for GPU Applications☆14Updated 10 months ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆48Updated this week