nginyc / rafikiLinks

Rafiki is a distributed system that supports training and deployment of machine learning models using AutoML, built with ease-of-use in mind.

☆35

Alternatives and similar repositories for rafiki

Users that are interested in rafiki are comparing it to the libraries listed below

Sorting:

SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
linnanwang / superneurons-release
this is the release repository of superneurons
☆52Updated 4 years ago
tbd-ai / tbd-suite
☆47Updated 2 years ago
netx-repo / PipeSwitch
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
☆126Updated 3 years ago
stanford-mast / INFaaS
Model-less Inference Serving
☆88Updated last year
saareliad / FTPipe
FTPipe and related pipeline model parallelism research.
☆41Updated 2 years ago
msr-fiddle / harmony
☆16Updated 2 years ago
Distributed-AI / PipeTransformer
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021
☆56Updated 3 years ago
uwsampl / nexus
☆82Updated last week
DS3Lab / DT-FM
☆94Updated 2 years ago
awslabs / lorien
☆43Updated last year
SymbioticLab / Salus
Fine-grained GPU sharing primitives
☆141Updated 5 years ago
YuhanLiu11 / AutoFreeze
☆22Updated 4 years ago
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆52Updated 10 months ago
DS3Lab / LambdaML
Machine learning on serverless platform
☆9Updated 2 years ago
ucbrise / hypersched
Deadline-based hyperparameter tuning on RayTune.
☆31Updated 5 years ago
byteps / examples
BytePS examples (Vision, NLP, GAN, etc)
☆19Updated 2 years ago
GuanhuaWang / sensAI
sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data
☆64Updated 11 months ago
jiazhihao / metaflow_sysml19
Repository for SysML19 Artifacts Evaluation
☆54Updated 6 years ago
hogepodge / tvm-docker
A basic Docker-based installation of TVM
☆11Updated 3 years ago
eth-easl / cachew
ML Input Data Processing as a Service. This repository contains the source code for Cachew (built on top of TensorFlow).
☆38Updated 9 months ago
HKBU-HPML / ddl-benchmarks
ddl-benchmarks: Benchmarks for Distributed Deep Learning
☆37Updated 5 years ago
msr-fiddle / CoorDL
☆24Updated 2 years ago
UofT-EcoSystem / hfta
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion
☆32Updated last year
uw-mad-dash / shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]
☆44Updated 2 years ago
kungfu-team / tenplex
Dynamic resources changes for multi-dimensional parallelism training
☆25Updated 7 months ago
ucbrise / cs294-ai-sys-fa19
CS294-162; Machine Learning Systems Seminar
☆31Updated 2 years ago
microsoft / SuperScaler
An experimental parallel training platform
☆54Updated last year
TalwalkarLab / paleo
An analytical performance modeling tool for deep neural networks.
☆89Updated 4 years ago
hku-systems / naspipe
☆14Updated 3 years ago