cake-lab / transient-deep-learning
Repo for transient training paper at ICAC 2019.
☆11Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for transient-deep-learning
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 2 years ago
- ☆10Updated last year
- This is a paper review repo. of top-tier Computer System Conferences☆10Updated 4 years ago
- Deadline-based hyperparameter tuning on RayTune.☆31Updated 4 years ago
- ☆13Updated 5 years ago
- GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)☆31Updated last year
- Network- and GPU-aware management of serverless functions at the edge☆13Updated last year
- Validation Generation for Kubeflow CRD on Kubernetes☆11Updated 3 years ago
- An Efficient Dynamic Resource Scheduler for Deep Learning Clusters☆41Updated 7 years ago
- A new version for Pytheas (formally DDN), a control platform for enabling data-driven control for network applications☆14Updated 7 years ago
- A super 🦄☆30Updated 5 years ago
- ESPBench - The Enterprise Stream Processing Benchmark☆13Updated 10 months ago
- Simulated large clusters for Kubernetes scheduler validation.☆15Updated last year
- Distributed tracing data from Meta's microservices architecture.☆17Updated last year
- Machine learning on serverless platform☆8Updated 2 years ago
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Updated 2 years ago
- Efficient set similarity search algorithms implemented in Go☆29Updated 2 years ago
- Studying GPU Multi-tenancy☆12Updated 5 years ago
- Machine Learning Inference Graph Spec☆21Updated 5 years ago
- Example of multi-process, multi-GPU training using Torch-parallel, nVidia-nccl, and nVidia-MPS☆15Updated 8 years ago
- High-performance key-value store☆12Updated 5 years ago
- Some microbenchmarks and design docs before commencement☆12Updated 3 years ago
- Static analysis framework for analyzing programs written in TVM's Relay IR.☆27Updated 5 years ago
- Intelligent platform for AI workloads☆37Updated last year
- ☆12Updated 2 years ago
- SCV is a distributed cluster GPU sniffer. SCV是一个分布式GPU嗅探器☆21Updated last year
- Runtime Tracing Library for TensorFlow☆42Updated 5 years ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Updated 3 years ago