☆49Aug 27, 2024Updated last year
Alternatives and similar repositories for TE-CCL
Users that are interested in TE-CCL are comparing it to the libraries listed below
Sorting:
- ☆16Apr 22, 2025Updated 10 months ago
- TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches☆80Jul 25, 2023Updated 2 years ago
- Codebase for Teal (SIGCOMM 2023)☆60Apr 19, 2024Updated last year
- Managed collective communication service☆23Sep 2, 2024Updated last year
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆32Jun 13, 2025Updated 8 months ago
- ☆18May 3, 2024Updated last year
- Microsoft Collective Communication Library☆66Nov 23, 2024Updated last year
- An evaluation framework for data center traffic engineering.☆13Jul 28, 2024Updated last year
- Codebase for FIGRET (SIGCOMM 2024)☆25Sep 24, 2024Updated last year
- Microsoft's open source max-min fair solver for cluster scheduling and traffic engineering☆18Feb 11, 2026Updated 2 weeks ago
- ☆29Dec 2, 2022Updated 3 years ago
- LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models☆12May 7, 2024Updated last year
- ☆64Jun 25, 2024Updated last year
- ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale☆522Jan 3, 2026Updated 2 months ago
- NCCL Profiling Kit☆152Jul 1, 2024Updated last year
- Synthesizer for optimal collective communication algorithms☆124Apr 8, 2024Updated last year
- ☆17Oct 8, 2024Updated last year
- Microsoft Collective Communication Library☆385Sep 20, 2023Updated 2 years ago
- [ACM SIGCOMM 2024] "m3: Accurate Flow-Level Performance Estimation using Machine Learning" by Chenning Li, Arash Nasr-Esfahany, Kevin Zha…☆25Oct 2, 2024Updated last year
- ☆24Dec 15, 2025Updated 2 months ago
- ☆11Mar 13, 2023Updated 2 years ago
- ☆44Jul 4, 2024Updated last year
- NS3 simulator for RDMA load balancing☆11Jan 31, 2025Updated last year
- A Throughput-Centric View of the Performance of Datacenter Topologies [SIGCOMM'21]☆10May 25, 2021Updated 4 years ago
- ☆813Dec 31, 2025Updated 2 months ago
- ☆44Sep 6, 2021Updated 4 years ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆13Dec 9, 2024Updated last year
- ☆10Apr 29, 2023Updated 2 years ago
- A minimum demo for PyTorch distributed extension functionality for collectives.☆15Jul 29, 2024Updated last year
- ☆10Nov 25, 2023Updated 2 years ago
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Nov 1, 2021Updated 4 years ago
- CausIL is an approach to estimate the causal graph for a cloud microservice system, where the nodes are the service-specific metrics whil…☆13Jul 3, 2023Updated 2 years ago
- ☆15Jan 7, 2023Updated 3 years ago
- ☆19Jun 1, 2025Updated 9 months ago
- NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer swi…☆347Aug 16, 2018Updated 7 years ago
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆27Dec 10, 2022Updated 3 years ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆475Updated this week
- GPU-accelerated LLM Training Simulator☆17Jun 26, 2025Updated 8 months ago
- Mitigating Routing Update Overhead for Traffic Engineering by Combining Destination-based Routing with Reinforcement Learning☆15Oct 16, 2022Updated 3 years ago