Thesys-lab / parity-models
Learning-Based Coded Computation
☆46Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for parity-models
- Code for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]☆38Updated last year
- ☆69Updated last year
- ☆43Updated 3 years ago
- The prototype for NSDI paper "NetHint: White-Box Networking for Multi-Tenant Data Centers"☆25Updated 9 months ago
- Dorylus: Affordable, Scalable, and Accurate GNN Training☆77Updated 3 years ago
- Phoenix dataplane system service☆51Updated 5 months ago
- Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training (MLSys '23)☆9Updated last year
- A Federated Execution Engine for Fast Distributed Computation Over Slow Networks☆26Updated 3 years ago
- A rust-based benchmark for BlueField SmartNICs.☆27Updated last year
- Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '2…☆15Updated last year
- Analyze network performance in distributed training☆16Updated 4 years ago
- Random collections of my interested research papers / projects☆20Updated 3 years ago
- Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling☆13Updated last year
- [NSDI 2023] TopoOpt: Optimizing the Network Topology for Distributed DNN Training☆26Updated 2 months ago
- TRAGEN: A Synthetic Trace Generator for Realistic Cache Simulations☆20Updated 7 months ago
- ☆13Updated 2 years ago
- ☆23Updated last year
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆34Updated last year
- Reading seminar in Harvard Cloud Networking and Systems Group☆16Updated 2 years ago
- ☆34Updated 4 months ago
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆23Updated last year
- ☆20Updated 7 years ago
- Benchmark Suite for RDMA Performance Isolation☆36Updated last year
- EuroSys '24: "Trinity: A Fast Compressed Multi-attribute Data Store"☆17Updated 6 months ago
- ☆51Updated 3 years ago
- This repository contains code for the paper: Bergsma S., Zeyl T., Senderovich A., and Beck J. C., "Generating Complex, Realistic Cloud Wo…☆42Updated 3 years ago
- Aequitas enables RPC-level QoS in datacenter networks.☆16Updated 2 years ago
- ☆18Updated 4 years ago
- Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.☆47Updated last year
- Proactive-adaptive arbitration between shipping compute and shipping data☆18Updated 3 years ago