uclasystem/bamboo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/uclasystem/bamboo)

uclasystem / bamboo

Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.

☆54

Alternatives and similar repositories for bamboo

Users that are interested in bamboo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

uclasystem / VQPy
View on GitHub
A language for video analytics
☆12Jan 26, 2023Updated 3 years ago
JF-D / Parcae
View on GitHub
☆22Apr 22, 2024Updated 2 years ago
uclasystem / Mako
View on GitHub
Mako is a low-pause, high-throughput garbage collector designed for memory-disaggregated datacenters.
☆15Sep 2, 2024Updated last year
uclasystem / dorylus
View on GitHub
Dorylus: Affordable, Scalable, and Accurate GNN Training
☆76May 31, 2021Updated 5 years ago
wjy99-c / QDiff
View on GitHub
☆10Sep 19, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
uclasystem / MemLiner
View on GitHub
MemLiner is a remote-memory-friendly runtime system.
☆31Nov 1, 2022Updated 3 years ago
vqpy / vqpy
View on GitHub
VQPy: An object-oriented approach to modern video analytics
☆42Oct 28, 2024Updated last year
SymbioticLab / Oobleck
View on GitHub
A resilient distributed training framework
☆100Apr 11, 2024Updated 2 years ago
microsoft / varuna
View on GitHub
☆250Jul 25, 2024Updated last year
uclasystem / Semeru
View on GitHub
A Memory-Disaggregated Managed Runtime.
☆67Aug 28, 2021Updated 4 years ago
uw-mad-dash / shockwave
View on GitHub
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
☆46Nov 24, 2022Updated 3 years ago
ParCIS / Chimera
View on GitHub
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆72Mar 20, 2025Updated last year
microsoft / SuperScaler
View on GitHub
An experimental parallel training platform
☆57Mar 25, 2024Updated 2 years ago
uclasystem / hermit
View on GitHub
Hermit: Low-Latency, High-Throughput, and Transparent Remote Memory via Feedback-Directed Asynchrony
☆35May 29, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
msr-fiddle / synergy
View on GitHub
☆54Dec 13, 2022Updated 3 years ago
UCLA-SEAL / HeteroGen
View on GitHub
HeteroGen: transpiling C to heterogeneous HLS code with automated test generation and program repair (ASPLOS 2022)
☆16Sep 25, 2024Updated last year
AlibabaPAI / DAPPLE
View on GitHub
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆76Dec 11, 2020Updated 5 years ago
artpad6 / gemel_nsdi23
View on GitHub
☆22Jan 15, 2024Updated 2 years ago
stanford-futuredata / gavel
View on GitHub
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
☆139Jul 25, 2024Updated last year
Rivendile / Muri
View on GitHub
Artifacts for our SIGCOMM'22 paper Muri
☆44Dec 29, 2023Updated 2 years ago
Hsword / SpotServe
View on GitHub
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆135Feb 22, 2024Updated 2 years ago
skypilot-org / spot-traces
View on GitHub
Releasing the spot availability traces used in "Can't Be Late" paper.
☆27Mar 31, 2024Updated 2 years ago
HPDL-Group / Merak
View on GitHub
☆86Feb 11, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
wangchenxi7 / Atlas
View on GitHub
☆16Jul 9, 2024Updated 2 years ago
awslabs / slapo
View on GitHub
A schedule language for large model training
☆153Aug 21, 2025Updated 11 months ago
DachengLi1 / AMP
View on GitHub
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆44Nov 4, 2022Updated 3 years ago
microsoft / msccl
View on GitHub
Microsoft Collective Communication Library
☆394Sep 20, 2023Updated 2 years ago
qzhang-ucr / BigFuzz
View on GitHub
☆12Jun 14, 2023Updated 3 years ago
uclasystem / canvas
View on GitHub
Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory
☆38Apr 19, 2023Updated 3 years ago
siasosp23 / artifacts
View on GitHub
☆24Aug 15, 2023Updated 2 years ago
pkusys / ElasticFlow
View on GitHub
Artifacts for our ASPLOS'23 paper ElasticFlow
☆56May 10, 2024Updated 2 years ago
JF-D / Proteus
View on GitHub
☆24Jul 7, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Nu-NSDI23 / Nu
View on GitHub
Nu is a new datacenter system that enables developers to build fungible applications that can use datacenter resources wherever they are.
☆42May 14, 2024Updated 2 years ago
hiddenlayer2020 / ML-Job-Scheduler-MLFS
View on GitHub
☆13Dec 18, 2020Updated 5 years ago
S-Lab-System-Group / HeliosArtifact
View on GitHub
HeliosArtifact
☆22Sep 27, 2022Updated 3 years ago
netx-repo / PipeSwitch
View on GitHub
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
☆127May 9, 2022Updated 4 years ago
Raphael-Hao / brainstorm
View on GitHub
Compiler for Dynamic Neural Networks
☆45Nov 13, 2023Updated 2 years ago
kwai / Megatron-Kwai
View on GitHub
LLM training technologies developed by kwai
☆71Jun 30, 2026Updated 3 weeks ago
S-Lab-System-Group / Awesome-DL-Scheduling-Papers
View on GitHub
☆332Jan 22, 2024Updated 2 years ago