eth-easl / modynLinks

Modyn is a research-platform for training ML models on growing datasets.

☆50

Alternatives and similar repositories for modyn

Users that are interested in modyn are comparing it to the libraries listed below

Sorting:

vqpy / vqpy
VQPy: An object-oriented approach to modern video analytics
☆42Updated last year
eth-easl / cachew
ML Input Data Processing as a Service. This repository contains the source code for Cachew (built on top of TensorFlow).
☆39Updated last year
WukLab / preble
Stateful LLM Serving
☆88Updated 8 months ago
SymbioticLab / Oobleck
A resilient distributed training framework
☆96Updated last year
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆80Updated 8 months ago
stanford-mast / INFaaS
Model-less Inference Serving
☆91Updated 2 years ago
HuaizhengZhang / MIGProfiler
Multi-Instance-GPU profiling tool
☆60Updated 2 years ago
msr-fiddle / DS-Analyzer
☆38Updated 4 years ago
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated 2 months ago
ml-energy / zeus
Measure and optimize the energy consumption of your AI applications!
☆307Updated last week
feature-store / ralf
☆31Updated 3 years ago
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆191Updated last year
alibaba / ServeGen
A framework for generating realistic LLM serving workloads
☆79Updated last month
S-Lab-System-Group / Hydro
Surrogate-based Hyperparameter Tuning System
☆27Updated 2 years ago
awslabs / slapo
A schedule language for large model training
☆151Updated 3 months ago
ByteDance-Seed / StragglerAnalysis
☆43Updated 6 months ago
hao-ai-lab / MuxServe
☆79Updated last month
DS3Lab / DT-FM
☆93Updated 3 years ago
geoffxy / habitat
🔮 Execution time predictions for deep neural network training iterations across different GPUs.
☆62Updated 2 years ago
Azure / msccl
Microsoft Collective Communication Library
☆66Updated 11 months ago
tyler-griggs / melange-release
☆48Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆91Updated 2 years ago
meta-pytorch / kraken
Triton-based Symmetric Memory operators and examples
☆63Updated last month
microsoft / varuna
☆252Updated last year
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆252Updated 4 months ago
casys-kaist / EnvPipe
☆25Updated 2 years ago
MaoZiming / papers
Paper-reading notes for Berkeley OS prelim exam.
☆14Updated last year
microsoft / SuperScaler
An experimental parallel training platform
☆56Updated last year
WukLab / InferCept
☆31Updated last year
dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆25Updated last year