intel / torch-cclLinks
oneCCL Bindings for Pytorch*
☆100Updated 2 weeks ago
Alternatives and similar repositories for torch-ccl
Users that are interested in torch-ccl are comparing it to the libraries listed below
Sorting:
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated last month
- oneAPI Collective Communications Library (oneCCL)☆241Updated 2 weeks ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆348Updated this week
- OpenAI Triton backend for Intel® GPUs☆200Updated last week
- This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.☆37Updated last year
- RCCL Performance Benchmark Tests☆73Updated 3 weeks ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- Issues related to MLPerf™ Inference policies, including rules and suggested changes☆63Updated 3 weeks ago
- Ahead of Time (AOT) Triton Math Library☆75Updated this week
- Python bindings for NVTX☆66Updated 2 years ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆149Updated this week
- ☆144Updated 6 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆85Updated last year
- Development repository for the Triton language and compiler☆127Updated this week
- Issues related to MLPerf™ training policies, including rules and suggested changes☆95Updated this week
- A Python library transfers PyTorch tensors between CPU and NVMe☆120Updated 8 months ago
- ☆41Updated this week
- ☆62Updated 8 months ago
- Microsoft Collective Communication Library☆66Updated 9 months ago
- A tensor-aware point-to-point communication primitive for machine learning☆262Updated last week
- ☆122Updated this week
- MLIR-based partitioning system☆120Updated last week
- A library of GPU kernels for sparse matrix operations.☆270Updated 4 years ago
- Training material for Nsight developer tools☆163Updated last year
- Training neural networks in TensorFlow 2.0 with 5x less memory☆132Updated 3 years ago
- Example of using pytorch's open device registration API☆30Updated 2 years ago
- System for automated integration of deep learning backends.☆47Updated 3 years ago
- ☆74Updated 4 months ago
- A CUTLASS implementation using SYCL☆35Updated last week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆158Updated last month