intel / torch-cclLinks
oneCCL Bindings for Pytorch*
☆102Updated last month
Alternatives and similar repositories for torch-ccl
Users that are interested in torch-ccl are comparing it to the libraries listed below
Sorting:
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 3 months ago
- oneAPI Collective Communications Library (oneCCL)☆245Updated last week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆356Updated this week
- OpenAI Triton backend for Intel® GPUs☆210Updated this week
- RCCL Performance Benchmark Tests☆77Updated this week
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆151Updated 3 weeks ago
- Issues related to MLPerf™ Inference policies, including rules and suggested changes☆64Updated 2 weeks ago
- This repository contains the results and code for the MLPerf™ Training v1.0 benchmark.☆37Updated last year
- Python bindings for NVTX☆66Updated 2 years ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆166Updated last week
- A Python library transfers PyTorch tensors between CPU and NVMe☆121Updated 10 months ago
- Home for OctoML PyTorch Profiler☆114Updated 2 years ago
- Ahead of Time (AOT) Triton Math Library☆76Updated 2 weeks ago
- Development repository for the Triton language and compiler☆131Updated this week
- Issues related to MLPerf™ training policies, including rules and suggested changes☆95Updated last week
- MLPerf™ logging library☆37Updated last week
- A library to analyze PyTorch traces.☆414Updated last week
- ☆45Updated this week
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆88Updated last year
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆120Updated last year
- ☆74Updated 6 months ago
- ☆121Updated 9 months ago
- A tensor-aware point-to-point communication primitive for machine learning☆273Updated last month
- ROCm Communication Collectives Library (RCCL)☆386Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆265Updated 2 months ago
- ☆145Updated 8 months ago
- ☆63Updated 9 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆161Updated last week
- Research and development for optimizing transformers☆130Updated 4 years ago
- Microsoft Collective Communication Library☆66Updated 10 months ago