spcl / daceLinks
DaCe - Data Centric Parallel Programming
☆542Updated this week
Alternatives and similar repositories for dace
Users that are interested in dace are comparing it to the libraries listed below
Sorting:
- Kernel Tuner☆351Updated this week
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆206Updated 2 months ago
- STREAM, for lots of devices written in many programming models☆344Updated 10 months ago
- ☆260Updated last month
- ☆247Updated last month
- A Python compiler design toolkit.☆370Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆137Updated this week
- Python SYCL bindings and SYCL-based Python Array API library☆114Updated this week
- Advanced Profiling and Analytics for AMD Hardware☆159Updated this week
- ☆158Updated last week
- collection of benchmarks to measure basic GPU capabilities☆391Updated 5 months ago
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆298Updated 3 weeks ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆277Updated 3 weeks ago
- CUDA Kernel Benchmarking Library☆679Updated this week
- Benchmark for measuring the performance of sparse and irregular memory access.☆78Updated 2 months ago
- A code generator for array-based code on CPUs and GPUs☆608Updated last week
- Unified Collective Communication Library☆259Updated last week
- NPBench - A Benchmarking Suite for High-Performance NumPy☆85Updated last month
- A Data-Centric Compiler for Machine Learning☆84Updated last year
- development repository for the open earth compiler☆80Updated 4 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆246Updated last week
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆748Updated 4 months ago
- The Foundation for All Legate Libraries☆218Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆91Updated last week
- TPP experimentation on MLIR for linear algebra☆133Updated last week
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆132Updated 5 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆224Updated 3 years ago
- A lightweight, Pythonic, frontend for MLIR☆81Updated last year
- The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.☆216Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆200Updated 5 months ago