ByteDance-Seed / StragglerAnalysisLinks
☆43Updated 6 months ago
Alternatives and similar repositories for StragglerAnalysis
Users that are interested in StragglerAnalysis are comparing it to the libraries listed below
Sorting:
- Efficient Compute-Communication Overlap for Distributed LLM Inference☆61Updated this week
 - ☆74Updated 2 weeks ago
 - DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆73Updated last week
 - A resilient distributed training framework☆96Updated last year
 - A lightweight design for computation-communication overlap.☆182Updated 3 weeks ago
 - Stateful LLM Serving☆87Updated 7 months ago
 - A framework for generating realistic LLM serving workloads☆73Updated 3 weeks ago
 - [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆61Updated last year
 - NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆142Updated last month
 - [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆189Updated last year
 - ☆63Updated last month
 - ☆64Updated 6 months ago
 - Microsoft Collective Communication Library☆67Updated 11 months ago
 - ☆79Updated 6 months ago
 - gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆42Updated last month
 - NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆67Updated 4 months ago
 - ☆67Updated 9 months ago
 - nnScaler: Compiling DNN models for Parallel Training☆118Updated last month
 - SpotServe: Serving Generative Large Language Models on Preemptible Instances☆131Updated last year
 - Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.☆97Updated last month
 - DeeperGEMM: crazy optimized version☆72Updated 5 months ago
 - ☆310Updated last month
 - DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆65Updated last week
 - Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆66Updated 5 months ago
 - The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221☆26Updated 6 months ago
 - Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆20Updated last year
 - ☆124Updated 11 months ago
 - ☆53Updated last week
 - Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆67Updated 7 months ago
 - PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆24Updated 3 weeks ago