ruipeterpan / cs759-sp21Links

CS/ECE/ME/EP 759 (High Performance Computing for Engineering Applications) Course Project: Cautiously Aggressive GPU Space Sharing to Improve Resource Utilization and Job Efficiency

☆8

Alternatives and similar repositories for cs759-sp21

Users that are interested in cs759-sp21 are comparing it to the libraries listed below

Sorting:

uw-mad-dash / shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
☆44Updated 2 years ago
msr-fiddle / CheckFreq
☆54Updated 4 years ago
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
stanford-mast / INFaaS
Model-less Inference Serving
☆88Updated last year
netx-repo / PipeSwitch
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
☆126Updated 3 years ago
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆36Updated 3 years ago
Sys-KU / DeepPlan
[ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆57Updated last year
uclasystem / bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆50Updated 2 years ago
ruipeterpan / paper_notes
Personal blog + reading notes on system-ish papers
☆15Updated last year
SJTU-IPADS / reef-artifacts
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆42Updated 3 years ago
netx-repo / training-bottleneck
Analyze network performance in distributed training
☆18Updated 4 years ago
chenhao-ye / cs537-sp21-discussion
Discussion section materials for COMP SCI 537 2021 Spring at the University of Wisconsin-Madison.
☆16Updated 4 years ago
casys-kaist / HUVM
☆24Updated 2 years ago
HuaizhengZhang / MIGProfiler
Multi-Instance-GPU profiling tool
☆60Updated 2 years ago
jasperzhong / read-papers-and-code
My paper/code reading notes in Chinese
☆46Updated last month
sitar-lab / NeuSight
☆45Updated 3 weeks ago
rkhan055 / SHADE
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training
☆35Updated 2 years ago
Soroosh129 / NeuOS
Source code for the paper: "A Latency-Predictable Multi-Dimensional Optimization Framework forDNN-driven Autonomous Systems"
☆22Updated 4 years ago
casys-kaist / EnvPipe
☆25Updated last year
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆52Updated 10 months ago
uclasystem / VQPy
A language for video analytics
☆13Updated 2 years ago
Raphael-Hao / Abacus
☆37Updated 3 weeks ago
Romero027 / sysnet-reading-list
This repository contains a list of papers on various topics (that I am working/worked on) in the system and networking area.
☆83Updated 2 months ago
pkusys / ElasticFlow
Artifacts for our ASPLOS'23 paper ElasticFlow
☆52Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆82Updated 2 years ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆107Updated last year
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated last year
msr-fiddle / synergy
☆51Updated 2 years ago
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆95Updated 2 years ago
geoffxy / habitat
🔮 Execution time predictions for deep neural network training iterations across different GPUs.
☆63Updated 2 years ago