libxsmm / libxsmm-dnn
Reference implementation of Deep Neural Network primitives using LIBXSMM's Tensor Processing Primitives (TPP)
☆12Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for libxsmm-dnn
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated last month
- Simplified Interface to Complex Memory☆26Updated last year
- A Benchmark Suite for Heterogeneous System Computation☆52Updated 3 weeks ago
- ☆11Updated 3 years ago
- Compute applications.☆25Updated 4 years ago
- Open source of an IBM Optimized version of the HPCG benchmark.☆14Updated 8 months ago
- development repository for the open earth compiler☆77Updated 3 years ago
- GPU Performance Advisor☆63Updated 2 years ago
- Official BOLT Repository☆27Updated 3 months ago
- ☆37Updated this week
- ☆47Updated 5 years ago
- A Top-Down Profiler for GPU Applications☆13Updated 8 months ago
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- Nanos++ is a runtime designed to serve as runtime support in parallel environments. It is mainly used to support OmpSs, a extension to O…☆38Updated 3 years ago
- Data-Centric MLIR dialect☆38Updated last year
- Chai☆42Updated 11 months ago
- PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolution…☆18Updated last month
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆11Updated last year
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆34Updated 9 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated 2 months ago
- ☆50Updated 5 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- An HPL-AI implementation for Fugaku☆19Updated 3 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆14Updated 5 years ago
- Yaksa: High-performance Noncontiguous Data Management☆13Updated last month
- The SparseX sparse kernel optimization library☆39Updated 5 years ago
- Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.☆17Updated 5 years ago
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated last month