hpdps-group / cocclLinks
COCCL: Compression and precision co-aware collective communication library
☆24Updated 3 months ago
Alternatives and similar repositories for coccl
Users that are interested in coccl are comparing it to the libraries listed below
Sorting:
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆12Updated 3 months ago
- Reference implementations of MLPerf™ HPC training benchmarks☆48Updated 4 months ago
- Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner☆20Updated last year
- ☆18Updated 5 years ago
- FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Data on GPUs☆14Updated last year
- A Micro-benchmarking Tool for HPC Networks☆30Updated 2 weeks ago
- A hierarchical collective communications library with portable optimizations☆35Updated 7 months ago
- JUPITER Benchmark Suite☆18Updated 11 months ago
- Graph-indexed Pandas DataFrames for analyzing hierarchical performance data☆34Updated last week
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆33Updated 3 months ago
- ☆17Updated this week
- Benchmark implementation of CosmoFlow in TensorFlow Keras☆21Updated last year
- ☆18Updated last year
- Slides and exercises for persistent memory programming tutorial☆13Updated 2 years ago
- An HPL-AI implementation for Fugaku☆21Updated 4 years ago
- A tracing infrastructure for heterogeneous computing applications.☆33Updated this week
- AI Accelerators-SC23-tutorial Repository☆11Updated last year
- Benchmark for measuring the performance of sparse and irregular memory access.☆78Updated 2 months ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆91Updated this week
- Benchmarks☆17Updated 2 months ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated last year
- Cosmic Tagging Network for Neutrino Physics☆13Updated last year
- Very-Low Overhead Checkpointing System☆58Updated 6 months ago
- Logger for MPI communication☆27Updated 2 years ago
- A Data-Centric Compiler for Machine Learning☆84Updated last year
- NAS Parallel Benchmarks for evaluating GPU and APIs☆26Updated last month
- ☆45Updated 4 years ago
- ☆10Updated 3 months ago
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆58Updated 3 weeks ago
- Chai☆44Updated last year