jiaxincao / model-factory
Model factory is a ML training platform to help engineers to build ML models at scale
☆18Updated 3 years ago
Alternatives and similar repositories for model-factory:
Users that are interested in model-factory are comparing it to the libraries listed below
- A Kubernetes operator for mxnet jobs☆53Updated 3 years ago
- Elastic Serverless Serving based on Kubernetes, provides 0 instance serving capability.☆10Updated 3 years ago
- ☆34Updated 3 years ago
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- Fault-tolerant for DL frameworks☆69Updated last year
- Studying GPU Multi-tenancy☆12Updated 6 years ago
- Kernel for Kubeflow in Jupyter Notebook☆67Updated 5 years ago
- Common APIs and libraries shared by other Kubeflow operator repositories.☆51Updated last year
- ☆48Updated 6 years ago
- Forked form☆10Updated 3 years ago
- The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes☆44Updated 3 years ago
- Elastic Deep Learning Training based on Kubernetes by Leveraging EDL and Volcano☆31Updated last year
- GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)☆33Updated last year
- Automatic tuning for ML model deployment on Kubernetes☆80Updated 2 months ago
- This repository contains statistics about the AI Infrastructure products.☆18Updated this week
- ☆51Updated last year
- Paper Reading:涉及分布式、虚拟化、网络、机器学习☆23Updated 4 years ago
- Custom Scheduler to deploy ML models to TRTIS for GPU Sharing☆12Updated 4 years ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- Elastic Deep Learning for deep learning framework on Kubernetes☆171Updated last year
- Deadline-based hyperparameter tuning on RayTune.☆31Updated 5 years ago
- Runtime Tracing Library for TensorFlow☆43Updated 6 years ago
- SCV is a distributed cluster GPU sniffer. SCV是一个分布式GPU嗅探器☆21Updated last year
- sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data☆64Updated 6 months ago
- An Efficient Dynamic Resource Scheduler for Deep Learning Clusters☆42Updated 7 years ago
- Fork of NVIDIA device plugin for Kubernetes with support for shared GPUs by declaring GPUs multiple times☆88Updated 2 years ago
- Statically and dynamically inspect tool for TensorFlow models☆24Updated 6 years ago
- GPU analyzer for Kubernetes GPU clusters☆17Updated 4 years ago
- Enhanced networking support for TensorFlow. Maintained by SIG-networking.☆98Updated 3 years ago
- Building Machine Learning Infrastructure!☆41Updated 6 years ago