intel / torch-ccl
oneCCL Bindings for Pytorch*
☆93Updated this week
Alternatives and similar repositories for torch-ccl:
Users that are interested in torch-ccl are comparing it to the libraries listed below
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated last month
- oneAPI Collective Communications Library (oneCCL)☆232Updated last week
- OpenAI Triton backend for Intel® GPUs☆175Updated this week
- Issues related to MLPerf™ training policies, including rules and suggested changes☆94Updated last week
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆134Updated last week
- Computation using data flow graphs for scalable machine learning☆67Updated this week
- ☆61Updated 3 months ago
- Benchmarks to capture important workloads.☆31Updated 2 months ago
- Synthesizer for optimal collective communication algorithms☆105Updated last year
- RCCL Performance Benchmark Tests☆60Updated this week
- Python bindings for NVTX☆66Updated last year
- ☆27Updated this week
- This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.☆38Updated last year
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆162Updated this week
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆116Updated last year
- ROCm Communication Collectives Library (RCCL)☆316Updated last week
- Issues related to MLPerf™ Inference policies, including rules and suggested changes☆60Updated last month
- Reference implementations of MLPerf™ HPC training benchmarks☆47Updated last month
- RDMA and SHARP plugins for nccl library☆187Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆314Updated this week
- CUDA Templates for Linear Algebra Subroutines☆20Updated this week
- Microsoft Collective Communication Library☆342Updated last year
- An extension library of WMMA API (Tensor Core API)☆95Updated 9 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆330Updated this week
- ☆49Updated last year
- ☆197Updated 9 months ago
- ☆37Updated this week
- Intel® Tensor Processing Primitives extension for Pytorch*☆14Updated last week
- A Python library transfers PyTorch tensors between CPU and NVMe☆113Updated 4 months ago