intel / torch-cclLinks
oneCCL Bindings for Pytorch* (deprecated)
☆104Updated last month
Alternatives and similar repositories for torch-ccl
Users that are interested in torch-ccl are comparing it to the libraries listed below
Sorting:
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 7 months ago
- oneAPI Collective Communications Library (oneCCL)☆253Updated last month
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆375Updated this week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆86Updated last week
- OpenAI Triton backend for Intel® GPUs☆226Updated this week
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆155Updated last week
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆93Updated 2 years ago
- Python bindings for NVTX☆67Updated 2 years ago
- Issues related to MLPerf® Inference policies, including rules and suggested changes☆63Updated last week
- This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.☆36Updated last year
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆144Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆164Updated this week
- ☆61Updated last year
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆410Updated this week
- A tensor-aware point-to-point communication primitive for machine learning☆283Updated last month
- Development repository for the Triton language and compiler☆140Updated last week
- System for automated integration of deep learning backends.☆47Updated 3 years ago
- PyTorch RFCs (experimental)☆138Updated 8 months ago
- ☆59Updated last week
- A library of GPU kernels for sparse matrix operations.☆283Updated 5 years ago
- Ahead of Time (AOT) Triton Math Library☆88Updated this week
- SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs☆64Updated last week
- ☆145Updated last year
- ☆159Updated last year
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆122Updated 2 years ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆125Updated last year
- ☆74Updated this week
- Computation using data flow graphs for scalable machine learning☆68Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆254Updated last week