feature-store / ralf
☆30Updated 2 years ago
Alternatives and similar repositories for ralf:
Users that are interested in ralf are comparing it to the libraries listed below
- Distributed ML Optimizer☆30Updated 3 years ago
- FTPipe and related pipeline model parallelism research.☆41Updated last year
- A resilient distributed training framework☆88Updated 9 months ago
- ML Input Data Processing as a Service. This repository contains the source code for Cachew (built on top of TensorFlow).☆36Updated 4 months ago
- Tracking Ray Enhancement Proposals☆48Updated this week
- Lightning In-Memory Object Store☆44Updated 2 years ago
- UCCL: an Efficient Collective Communication Library for GPUs☆18Updated this week
- Deadline-based hyperparameter tuning on RayTune.☆31Updated 5 years ago
- ☆15Updated last year
- Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.☆48Updated 2 years ago
- ☆43Updated 3 years ago
- Model-less Inference Serving☆83Updated last year
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆80Updated last year
- ☆44Updated last year
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆109Updated 10 months ago
- Modyn is a research-platform for training ML models on growing datasets.☆38Updated this week
- Exoshuffle-CloudSort☆24Updated last year
- A universal workflow system for exactly-once DAGs☆23Updated last year
- ☆23Updated last year
- Data System for Optimized Deep Learning Model Selection☆20Updated 2 years ago
- Microsoft Collective Communication Library☆60Updated last month
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆126Updated 2 years ago
- Simple Distributed Deep Learning on TensorFlow☆134Updated 2 years ago
- sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data☆64Updated 5 months ago
- Releasing the spot availability traces used in "Can't Be Late" paper.☆17Updated 9 months ago
- A Generic Resource-Aware Hyperparameter Tuning Execution Engine☆15Updated 3 years ago
- Cloud Native Benchmarking of Foundation Models☆21Updated 2 months ago
- A schedule language for large model training☆143Updated 7 months ago
- RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads☆42Updated 3 years ago
- Python package for rematerialization-aware gradient checkpointing☆24Updated last year