eth-easl/sailor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/eth-easl/sailor)

eth-easl / sailor

AI model training on heterogeneous, geo-distributed resources

☆46

Alternatives and similar repositories for sailor

Users that are interested in sailor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Relaxed-System-Lab / HexiScale
View on GitHub
Accommodating Large Language Model Training over Heterogeneous Environment.
☆32Mar 13, 2025Updated last year
spcl / crosspipe
View on GitHub
Official implementation of CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training (ATC '25), built on top of Megatro…
☆17Jul 6, 2025Updated last year
lsds / Tempo
View on GitHub
Tempo is a system for declarative, efficient, end-to-end compiled dynamic deep learning
☆30Oct 21, 2025Updated 9 months ago
eth-easl / mixtera
View on GitHub
A lightweight, user-friendly data-plane for LLM training.
☆40Sep 10, 2025Updated 10 months ago
DeepLink-org / ditorch
View on GitHub
☆31Jan 7, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
denght23 / CAVER
View on GitHub
NS3 simulator for RDMA load balancing
☆12Jan 31, 2025Updated last year
siasosp23 / artifacts
View on GitHub
☆24Aug 15, 2023Updated 2 years ago
Networked-System-and-Security-Group / Themis
View on GitHub
ICNP'25-THEMIS: Addressing Congestion-Induced Unfairness in Long-Haul RDMA Networks
☆16Jun 27, 2026Updated 3 weeks ago
umassos / GAIA
View on GitHub
☆12Mar 27, 2024Updated 2 years ago
S-Lab-System-Group / Hydro
View on GitHub
Surrogate-based Hyperparameter Tuning System
☆30Jun 29, 2023Updated 3 years ago
bytedance / QSync
View on GitHub
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Feb 23, 2024Updated 2 years ago
DachengLi1 / AMP
View on GitHub
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆44Nov 4, 2022Updated 3 years ago
netiken / m4
View on GitHub
[TBD] "m4: A Learned Flow-level Network Simulator" by Chenning Li, Anton A. Zabreyko, Om Chabra, Arash Nasr-Esfahany, Kevin Zhao, Pratees…
☆21Jun 19, 2026Updated last month
DS3Lab / Decentralized_FM_alpha
View on GitHub
☆18May 4, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
nex-agi / NexVenusCL
View on GitHub
Nex Venus Communication Library
☆75Nov 17, 2025Updated 8 months ago
NASA-NJU / PrioPlus
View on GitHub
☆25Oct 7, 2025Updated 9 months ago
michaelzhiluo / starburst
View on GitHub
Burstable Cloud Scheduler
☆17Jun 6, 2024Updated 2 years ago
gbxu / autoccl
View on GitHub
[NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training
☆34May 2, 2025Updated last year
Thesys-lab / Helix-ASPLOS25
View on GitHub
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆93Oct 15, 2025Updated 9 months ago
ByteDance-Seed / StragglerAnalysis
View on GitHub
☆56Apr 30, 2025Updated last year
netiken / m3
View on GitHub
[ACM SIGCOMM 2024] "m3: Accurate Flow-Level Performance Estimation using Machine Learning" by Chenning Li, Arash Nasr-Esfahany, Kevin Zha…
☆25Oct 2, 2024Updated last year
microsoft / SuperScaler
View on GitHub
An experimental parallel training platform
☆57Mar 25, 2024Updated 2 years ago
eth-easl / deltazip
View on GitHub
Compression for Foundation Models
☆36Jul 21, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
eth-easl / gpu-util-interference
View on GitHub
CUDA benchmarks for measuring GPU utilization and interference
☆18Feb 11, 2025Updated last year
microsoft / TE-CCL
View on GitHub
☆56Aug 27, 2024Updated last year
oliverYoung2001 / UltraAttn
View on GitHub
SC'25 UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling
☆16Aug 14, 2025Updated 11 months ago
DicardoX / Research-Space
View on GitHub
This repository is established to store personal notes and annotated papers during daily research.
☆201Jun 28, 2026Updated 3 weeks ago
alpa-projects / tensorflow-alpa
View on GitHub
☆23May 10, 2023Updated 3 years ago
astra-sim / astra-sim
View on GitHub
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
☆648Apr 25, 2026Updated 2 months ago
alibaba / alibaba-lingjun-dataset-2023
View on GitHub
☆67Jun 25, 2024Updated 2 years ago
eth-easl / OpenTela
View on GitHub
OpenTela is a decentralized compute fabric for running machine learning applications.
☆42Jul 14, 2026Updated last week
xinhaoc / ferret
View on GitHub
Autonomous CUDA kernel optimization agent with structured task specs and per-config scoring
☆17Jun 17, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
LLMServe / hydraserve
View on GitHub
☆20May 11, 2026Updated 2 months ago
bigrl-team / gear
View on GitHub
A distributed GPU-centric experience replay system for large AI models.
☆19Aug 1, 2023Updated 2 years ago
arjundevraj / stragglar
View on GitHub
☆15Oct 2, 2025Updated 9 months ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
sii-research / VCCL
View on GitHub
Venus Collective Communication Library, supported by SII and Infrawaves.
☆152Jun 24, 2026Updated last month
DS3Lab / AC-SGD
View on GitHub
Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.
☆29Apr 25, 2023Updated 3 years ago
AmberLJC / megatron-system-analysis
View on GitHub
☆17Jan 31, 2026Updated 5 months ago