[NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training
☆32May 2, 2025Updated last year
Alternatives and similar repositories for autoccl
Users that are interested in autoccl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆24Feb 12, 2025Updated last year
- Code for "Practical Low-Rank Communication Compression in Decentralized Deep Learning"☆17Aug 4, 2020Updated 5 years ago
- ☆25May 9, 2025Updated last year
- [TBD] "m4: A Learned Flow-level Network Simulator" by Chenning Li, Anton A. Zabreyko, Om Chabra, Arash Nasr-Esfahany, Kevin Zhao, Pratees…☆21Jun 19, 2026Updated last week
- [NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo …☆16Dec 16, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Demo for testing dynamically load the libos module.☆10Nov 8, 2023Updated 2 years ago
- ☆88Sep 15, 2025Updated 9 months ago
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- based on Differential Cryptanalysis of the Full 16-round DES(Eli Biham / Adi Shamir), all comments and report are written in Korean.☆10Dec 19, 2022Updated 3 years ago
- A vLLM plugin built on the FlagOS unified multi-chip backend.☆61Updated this week
- Atomo: Communication-efficient Learning via Atomic Sparsification☆29Dec 9, 2018Updated 7 years ago
- [ICLR 2025] DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models☆19Mar 25, 2025Updated last year
- A minimum demo for PyTorch distributed extension functionality for collectives.☆15Jul 29, 2024Updated last year
- Elastic computing platform☆33Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated 2 years ago
- A WIP project based on CAP-VM☆18Nov 9, 2023Updated 2 years ago
- MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters☆21Apr 21, 2023Updated 3 years ago
- Source code of Fuyao, built on Nightcore☆17Mar 8, 2024Updated 2 years ago
- 国科大研究生课程 操作系统高级教程2023年思考题☆12Dec 24, 2023Updated 2 years ago
- ☆19Oct 2, 2023Updated 2 years ago
- Code for reproducing experiments performed for Accoridon☆13Jun 11, 2021Updated 5 years ago
- Here is the repo for public scripts.☆12Jul 16, 2022Updated 3 years ago
- official implementation of paper SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training☆44Dec 11, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆13Aug 6, 2022Updated 3 years ago
- ☆10Sep 3, 2017Updated 8 years ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated 2 years ago
- SocksDirect code repository☆20May 6, 2026Updated last month
- High Performance KV Cache Store for LLM☆56May 20, 2026Updated last month
- ☆27Nov 5, 2022Updated 3 years ago
- Accelerated in CUDA☆11Oct 28, 2022Updated 3 years ago
- MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning☆12Apr 26, 2021Updated 5 years ago
- An interface to program any congestion control protocol for an unreliable connection based protocol sent over UDP. It comes with a clean …☆12Apr 8, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- AI Cluster Observability & Troubleshooting Toolkit. Powered by SII & Infrawaves.☆36Apr 29, 2026Updated 2 months ago
- ☆14Sep 29, 2017Updated 8 years ago
- Tile-based language built for AI computation across all scales☆170Jun 16, 2026Updated last week
- Cheetah is a system that optimizes queries using programmable switches.☆21Jun 25, 2020Updated 6 years ago
- Slowdown prediction module of Echo: Simulating Distributed Training at Scale☆13May 17, 2025Updated last year
- This repository contains a SystemVerilog implementation of a parametrized Round Robin arbiter with three instantiation options☆13Jan 28, 2024Updated 2 years ago
- ☆29May 24, 2024Updated 2 years ago