uw-mad-dash/shockwave

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/uw-mad-dash/shockwave)

uw-mad-dash / shockwave

Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]

☆46

Alternatives and similar repositories for shockwave

Users that are interested in shockwave are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

stanford-futuredata / gavel
View on GitHub
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
☆139Jul 25, 2024Updated last year
hiddenlayer2020 / ML-Job-Scheduler-MLFS
View on GitHub
☆13Dec 18, 2020Updated 5 years ago
uw-mad-dash / Accordion
View on GitHub
Code for reproducing experiments performed for Accoridon
☆13Jun 11, 2021Updated 5 years ago
gudiandian / ElasticFlow
View on GitHub
☆17May 10, 2024Updated 2 years ago
ruipeterpan / paper_notes
View on GitHub
Personal blog + reading notes on system-ish papers
☆17Oct 29, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SymbioticLab / ModelKeeper
View on GitHub
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆36Jan 9, 2023Updated 3 years ago
uclasystem / bamboo
View on GitHub
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆55Dec 11, 2022Updated 3 years ago
Rivendile / Muri
View on GitHub
Artifacts for our SIGCOMM'22 paper Muri
☆44Dec 29, 2023Updated 2 years ago
harvard-cns / Harvard-CNS-Seminar
View on GitHub
Reading seminar in Harvard Cloud Networking and Systems Group
☆16Aug 29, 2022Updated 3 years ago
S-Lab-System-Group / Lucid
View on GitHub
Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs
☆61May 21, 2023Updated 3 years ago
siasosp23 / artifacts
View on GitHub
☆24Aug 15, 2023Updated 2 years ago
ruipeterpan / torch_profiler
View on GitHub
Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo
☆12Feb 12, 2023Updated 3 years ago
columbia / PrivateKube
View on GitHub
Privacy Budget Orchestration in Machine Learning Workloads (OSDI '21)
☆27Oct 20, 2023Updated 2 years ago
microsoft / elasticflow-traces
View on GitHub
Integrated Training Platform (ITP) traces used in ElasticFlow paper.
☆31Dec 23, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
systems-seminar-uiuc / systems-seminar-uiuc.github.io
View on GitHub
Website for Systems Research Seminar at UIUC
☆21May 7, 2026Updated 2 months ago
bytedance / QSync
View on GitHub
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Feb 23, 2024Updated 2 years ago
SymbioticLab / Tiresias
View on GitHub
Tiresias is a GPU cluster manager for distributed deep learning training.
☆166May 7, 2020Updated 6 years ago
uclasystem / VQPy
View on GitHub
A language for video analytics
☆12Jan 26, 2023Updated 3 years ago
UWNetworksLab / meshinsight
View on GitHub
MeshInsight: Dissecting Overheads of Service Mesh Sidecars
☆48Dec 21, 2023Updated 2 years ago
heyfey / vodascheduler
View on GitHub
GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)
☆33Nov 11, 2023Updated 2 years ago
S-Lab-System-Group / HeliosData
View on GitHub
Helios Traces from SenseTime
☆63Sep 27, 2022Updated 3 years ago
msr-fiddle / blox
View on GitHub
☆47Jul 4, 2024Updated 2 years ago
S-Lab-System-Group / ChronusArtifact
View on GitHub
☆23Jan 7, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
S-Lab-System-Group / Primo
View on GitHub
Primo: Practical Learning-Augmented Systems with Interpretable Models
☆19Dec 26, 2023Updated 2 years ago
operate-first / ai-for-cloud-ops
View on GitHub
Boston University Collaboratory project for applying AI to cloud operations
☆12Dec 5, 2022Updated 3 years ago
dywsjtu / apparate
View on GitHub
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆24Nov 21, 2024Updated last year
MingjiHan99 / KVRaft
View on GitHub
MIT 6.824 2020
☆10Mar 31, 2021Updated 5 years ago
llm-db / FineInfer
View on GitHub
Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)
☆19May 28, 2024Updated 2 years ago
pkusys / ElasticFlow
View on GitHub
Artifacts for our ASPLOS'23 paper ElasticFlow
☆56May 10, 2024Updated 2 years ago
ShawnZhong / MadFS
View on GitHub
Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”
☆51Mar 5, 2023Updated 3 years ago
S-Lab-System-Group / Awesome-DL-Scheduling-Papers
View on GitHub
☆333Jan 22, 2024Updated 2 years ago
Romero027 / sysnet-reading-list
View on GitHub
This repository contains a list of papers on various topics (that I am working/worked on) in the system and networking area.
☆89Feb 13, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
raywan-110 / AdaQP
View on GitHub
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
☆24Mar 1, 2024Updated 2 years ago
michaelzhiluo / starburst
View on GitHub
Burstable Cloud Scheduler
☆17Jun 6, 2024Updated 2 years ago
msr-fiddle / philly-traces
View on GitHub
☆198Aug 31, 2019Updated 6 years ago
Froot-NetSys / Arya
View on GitHub
Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling
☆18Sep 27, 2023Updated 2 years ago
SJTU-IPADS / reef-artifacts
View on GitHub
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆43May 29, 2022Updated 4 years ago
marius-team / marius
View on GitHub
Large scale graph learning on a single machine.
☆167Feb 25, 2025Updated last year
appnet-org / appnet
View on GitHub
Expressive, Easy to Build, and High-Performance Application Networks
☆20Jul 1, 2025Updated last year