ucbrise / caravel
Studying GPU Multi-tenancy
☆12Updated 6 years ago
Alternatives and similar repositories for caravel:
Users that are interested in caravel are comparing it to the libraries listed below
- Runtime Tracing Library for TensorFlow☆43Updated 6 years ago
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 2 years ago
- 碩士論文文獻筆記(Deep Learning、Scheduling、Distributed、Kubernetes)☆50Updated 5 years ago
- Paper Reading:涉及分布式、虚拟化、网络、机器学习☆23Updated 4 years ago
- Forked form☆10Updated 4 years ago
- A Kubernetes operator for mxnet jobs☆53Updated 3 years ago
- Fault-tolerant for DL frameworks☆69Updated last year
- An Efficient Dynamic Resource Scheduler for Deep Learning Clusters☆42Updated 7 years ago
- CS294-162; Machine Learning Systems Seminar☆31Updated last year
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆19Updated 3 weeks ago
- ☆18Updated 7 years ago
- Elastic Serverless Serving based on Kubernetes, provides 0 instance serving capability.☆10Updated 3 years ago
- Enhanced networking support for TensorFlow. Maintained by SIG-networking.☆98Updated 3 years ago
- Static analysis framework for analyzing programs written in TVM's Relay IR.☆27Updated 5 years ago
- Release doc/tutorial/wheels for poseidon-tf☆10Updated 7 years ago
- SCV is a distributed cluster GPU sniffer. SCV是一个分布式GPU嗅探器☆21Updated 2 years ago
- Fine-grained GPU sharing primitives☆141Updated 4 years ago
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- ☆58Updated 4 years ago
- ☆11Updated 3 years ago
- Model-less Inference Serving☆85Updated last year
- ☆83Updated 2 years ago
- DevComm-Shanghai Weekly 上海地区高校技术社团联合周报(欢迎投稿)☆65Updated 2 weeks ago
- ☆44Updated last year
- Building Machine Learning Infrastructure!☆42Updated 6 years ago
- High-performance key-value store☆12Updated 6 years ago
- NVIDIA device plugin for Kubernetes☆15Updated 5 years ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆51Updated 2 years ago
- Implementation for MIT 6.824 Distributed System 2016☆7Updated 8 years ago
- ☆51Updated last year