oneapi-src / oneCCL
oneAPI Collective Communications Library (oneCCL)
☆201Updated this week
Related projects ⓘ
Alternatives and complementary repositories for oneCCL
- ROCm Communication Collectives Library (RCCL)☆267Updated this week
- oneCCL Bindings for Pytorch*☆86Updated last week
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆201Updated last week
- Unified Collective Communication Library☆205Updated this week
- NCCL Profiling Kit☆109Updated 4 months ago
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- High-performance, GPU-aware communication library☆84Updated 2 weeks ago
- RDMA and SHARP plugins for nccl library☆160Updated 3 weeks ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆57Updated 2 months ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆122Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆220Updated this week
- Computation using data flow graphs for scalable machine learning☆66Updated this week
- RCCL Performance Benchmark Tests☆48Updated 2 weeks ago
- Magnum IO community repo☆80Updated 5 months ago
- ☆57Updated this week
- GPUDirect Async support for IB Verbs☆90Updated last year
- AMD's graph optimization engine.☆185Updated this week
- ☆313Updated 6 months ago
- oneAPI Level Zero Specification Headers and Loader☆218Updated last week
- An implementation of BLAS using the SYCL open standard.☆259Updated last week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆99Updated this week
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆113Updated 11 months ago
- ROC profiler library. Profiling with perf-counters and derived metrics.☆127Updated this week
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆363Updated 2 months ago
- ☆20Updated last year
- ☆228Updated this week
- collection of benchmarks to measure basic GPU capabilities☆264Updated 4 months ago
- ☆398Updated this week
- STREAM, for lots of devices written in many programming models☆325Updated 2 months ago
- Synthesizer for optimal collective communication algorithms☆98Updated 7 months ago