spcl / daceLinks
DaCe - Data Centric Parallel Programming
☆535Updated this week
Alternatives and similar repositories for dace
Users that are interested in dace are comparing it to the libraries listed below
Sorting:
- Kernel Tuner☆340Updated this week
- A Data-Centric Compiler for Machine Learning☆83Updated last year
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆206Updated last month
- NPBench - A Benchmarking Suite for High-Performance NumPy☆81Updated 3 weeks ago
- CUDA Kernel Benchmarking Library☆656Updated last week
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆293Updated 2 months ago
- This is the top-level repository for the Accel-Sim framework.☆418Updated this week
- Unified Collective Communication Library☆256Updated this week
- Rich editor for SDFGs with included profiling and debugging, static analysis, and interactive optimization.☆19Updated 4 months ago
- Rodinia benchmark☆179Updated 2 years ago
- collection of benchmarks to measure basic GPU capabilities☆377Updated 3 months ago
- An out-of-tree MLIR dialect template.☆101Updated 9 months ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆135Updated last week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆268Updated this week
- A code generator for array-based code on CPUs and GPUs☆604Updated last week
- ☆250Updated this week
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆718Updated 3 months ago
- Data Parallel Extension for Numba☆81Updated 6 months ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆506Updated 2 years ago
- A Python Compiler Design Toolkit☆360Updated this week
- C/C++ frontend for MLIR. Also features polyhedral optimizations, parallel optimizations, and more!☆547Updated last week
- Backward compatible ML compute opset inspired by HLO/MHLO☆485Updated this week
- ☆258Updated this week
- ☆245Updated last week
- An MLIR-based toolchain for AMD AI Engine-enabled devices.☆408Updated this week
- ☆416Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆242Updated last week
- Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.☆395Updated this week
- Python interface for MLIR - the Multi-Level Intermediate Representation☆257Updated 6 months ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated 2 weeks ago