noise-lab / ml-systems
Machine Learning for Computer Systems
☆11Updated last week
Related projects ⓘ
Alternatives and complementary repositories for ml-systems
- [ACM SIGCOMM 2024] "m3: Accurate Flow-Level Performance Estimation using Machine Learning" by Chenning Li*, Arash Nasr-Esfahany*, Kevin Z…☆17Updated last month
- Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training (MLSys '23)☆9Updated last year
- ☆10Updated 6 months ago
- ☆17Updated 2 months ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆11Updated 5 months ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Updated last year
- Network Traffic Transformer to learn network dynamics from packet traces. Learn fundamental dynamics with pre-training and fine-tune to m…☆18Updated 9 months ago
- 📜 [NeurIPS 2022] "Symbolic Distillation for Learned TCP Congestion Control", S P Sharan, Wenqing Zheng, Kuo-Feng Hsu, Jiarong Xing, Ang …☆12Updated 2 years ago
- THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression☆14Updated 3 months ago
- ☆12Updated 9 months ago
- ☆8Updated last month
- Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…☆15Updated last year
- Switch-based Training Acceleration for Machine Learning (SwitchML)☆14Updated 3 years ago
- ☆32Updated 4 months ago
- ☆34Updated last year
- ☆22Updated last year
- Codebase for FIGRET (SIGCOMM 2024)☆11Updated last month
- ☆12Updated 4 years ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆33Updated last year
- ☆21Updated last year
- ☆9Updated 11 months ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 8 months ago
- MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters☆14Updated last year
- This repository contains code for the paper: Bergsma S., Zeyl T., Senderovich A., and Beck J. C., "Generating Complex, Realistic Cloud Wo…☆42Updated 3 years ago
- ☆10Updated 3 years ago
- Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters with at Scale☆17Updated 4 years ago
- ☆20Updated 3 months ago
- A reinforcement learning algorithm for congestion control, together with a realistic Omnet++ network simulation environment☆20Updated last year
- [ACM SoCC'22] Pisces: Efficient Federated Learning via Guided Asynchronous Training☆12Updated 11 months ago
- Managed collective communication service☆12Updated 2 months ago