SymbioticLab/Oobleck

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SymbioticLab/Oobleck)

SymbioticLab / Oobleck

A resilient distributed training framework

☆100

Alternatives and similar repositories for Oobleck

Users that are interested in Oobleck are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

uclasystem / bamboo
View on GitHub
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆54Dec 11, 2022Updated 3 years ago
SymbioticLab / ModelKeeper
View on GitHub
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆36Jan 9, 2023Updated 3 years ago
casys-kaist / EnvPipe
View on GitHub
☆27Aug 31, 2023Updated 2 years ago
DataStates / datastates-llm
View on GitHub
LLM checkpointing for DeepSpeed/Megatron
☆25Nov 30, 2025Updated 7 months ago
JF-D / Parcae
View on GitHub
☆22Apr 22, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
skypilot-org / spot-traces
View on GitHub
Releasing the spot availability traces used in "Can't Be Late" paper.
☆26Mar 31, 2024Updated 2 years ago
siasosp23 / artifacts
View on GitHub
☆24Aug 15, 2023Updated 2 years ago
microsoft / varuna
View on GitHub
☆250Jul 25, 2024Updated last year
awslabs / optimizing-multitask-training-through-dynamic-pipelines
View on GitHub
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆19Dec 8, 2023Updated 2 years ago
ml-energy / zeus
View on GitHub
Measure and optimize the energy consumption of your AI applications!
☆368Jul 7, 2026Updated last week
HPDL-Group / Merak
View on GitHub
☆86Feb 11, 2026Updated 5 months ago
jiazhihao / attention_superoptimizer
View on GitHub
An Attention Superoptimizer
☆22Jan 20, 2025Updated last year
Sys-KU / DeepPlan
View on GitHub
[ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆56Aug 6, 2025Updated 11 months ago
bytedance / QSync
View on GitHub
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Feb 23, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kungfu-team / tenplex
View on GitHub
Dynamic resources changes for multi-dimensional parallelism training
☆31Aug 22, 2025Updated 10 months ago
AmberLJC / FLsystem-paper
View on GitHub
Federated Learning Systems Paper List
☆75Feb 7, 2024Updated 2 years ago
msr-fiddle / philly-traces
View on GitHub
☆199Aug 31, 2019Updated 6 years ago
SymbioticLab / Fluid
View on GitHub
A Generic Resource-Aware Hyperparameter Tuning Execution Engine
☆15Jan 8, 2022Updated 4 years ago
ml-energy / leaderboard-v2
View on GitHub
A canonical source of GenAI energy benchmark and meausrements
☆50Nov 29, 2025Updated 7 months ago
jaywonchung / dotfiles
View on GitHub
Dotfile management with bare git
☆22Jul 4, 2026Updated 2 weeks ago
SymbioticLab / Salus
View on GitHub
Fine-grained GPU sharing primitives
☆149Jul 28, 2025Updated 11 months ago
DachengLi1 / AMP
View on GitHub
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆44Nov 4, 2022Updated 3 years ago
msr-fiddle / CheckFreq
View on GitHub
☆57Jan 25, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / SuperScaler
View on GitHub
An experimental parallel training platform
☆57Mar 25, 2024Updated 2 years ago
mosharaf / eecs598
View on GitHub
Advanced Topics on Systems for X
☆288Jul 10, 2024Updated 2 years ago
S-Lab-System-Group / HeliosData
View on GitHub
Helios Traces from SenseTime
☆63Sep 27, 2022Updated 3 years ago
stanford-futuredata / gavel
View on GitHub
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
☆139Jul 25, 2024Updated last year
Relaxed-System-Lab / HexGen
View on GitHub
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆37May 6, 2024Updated 2 years ago
SymbioticLab / Aequitas
View on GitHub
Aequitas enables RPC-level QoS in datacenter networks.
☆17Jul 19, 2022Updated 4 years ago
msr-fiddle / blox
View on GitHub
☆46Jul 4, 2024Updated 2 years ago
DS3Lab / Decentralized_FM_alpha
View on GitHub
☆18May 4, 2023Updated 3 years ago
SymbioticLab / Sol
View on GitHub
A Federated Execution Engine for Fast Distributed Computation Over Slow Networks
☆25Apr 26, 2021Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
efeslab / eecs582
View on GitHub
Course website for Advanced Operating Systems
☆13Apr 8, 2022Updated 4 years ago
efeslab / ConsumerBench
View on GitHub
A benchmarking framework for on-device AI
☆19Jun 7, 2026Updated last month
tonyzhao-jt / LLM-PQ
View on GitHub
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆39Aug 29, 2025Updated 10 months ago
SymbioticLab / Oort
View on GitHub
Oort: Efficient Federated Learning via Guided Participant Selection
☆138Oct 27, 2021Updated 4 years ago
microsoft / msccl
View on GitHub
Microsoft Collective Communication Library
☆394Sep 20, 2023Updated 2 years ago
RulinShao / FastCkpt
View on GitHub
Python package for rematerialization-aware gradient checkpointing
☆27Oct 31, 2023Updated 2 years ago
zhaiyi000 / tlm
View on GitHub
☆49Jul 13, 2024Updated 2 years ago