ByteDance-Seed / StragglerAnalysisLinks
☆42Updated 5 months ago
Alternatives and similar repositories for StragglerAnalysis
Users that are interested in StragglerAnalysis are comparing it to the libraries listed below
Sorting:
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆67Updated this week
- ☆72Updated last year
- Efficient Compute-Communication Overlap for Distributed LLM Inference☆58Updated last week
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆99Updated last week
- Stateful LLM Serving☆85Updated 7 months ago
- Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]☆38Updated 7 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆60Updated 11 months ago
- A resilient distributed training framework☆95Updated last year
- DeeperGEMM: crazy optimized version☆72Updated 5 months ago
- A lightweight design for computation-communication overlap.☆179Updated 3 weeks ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆135Updated 3 weeks ago
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆63Updated 5 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆185Updated last year
- ☆65Updated 5 months ago
- Microsoft Collective Communication Library☆66Updated 10 months ago
- A framework for generating realistic LLM serving workloads☆65Updated 2 weeks ago
- ☆50Updated 4 months ago
- ☆60Updated 9 months ago
- nnScaler: Compiling DNN models for Parallel Training☆118Updated 3 weeks ago
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆20Updated last year
- NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆64Updated 3 months ago
- ☆46Updated 10 months ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆25Updated 10 months ago
- ☆57Updated 3 weeks ago
- The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221☆25Updated 5 months ago
- gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆41Updated 2 weeks ago
- Dynamic resources changes for multi-dimensional parallelism training☆28Updated last month
- ☆21Updated last year
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆129Updated last year
- ☆132Updated last year