alibaba / llm-scheduling-artifactView external linksLinks
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Jun 5, 2024Updated last year
Alternatives and similar repositories for llm-scheduling-artifact
Users that are interested in llm-scheduling-artifact are comparing it to the libraries listed below
Sorting:
- Efficient and easy multi-instance LLM serving☆527Sep 3, 2025Updated 5 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆209Sep 21, 2024Updated last year
- Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation☆40Nov 10, 2025Updated 3 months ago
- ☆131Nov 11, 2024Updated last year
- A low-latency & high-throughput serving engine for LLMs☆474Jan 8, 2026Updated last month
- ☆13Feb 6, 2026Updated last week
- ☆18Mar 4, 2025Updated 11 months ago
- an implementation of parallel skills like amp, ddp, pp, tp for learning purposes☆14Nov 18, 2023Updated 2 years ago
- Compiler for Dynamic Neural Networks☆45Nov 13, 2023Updated 2 years ago
- Disaggregated serving system for Large Language Models (LLMs).☆776Apr 6, 2025Updated 10 months ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19May 28, 2024Updated last year
- ☆630Jan 14, 2026Updated last month
- An experimental parallel training platform☆56Mar 25, 2024Updated last year
- Resource Allocation for Dynamic Demands☆22Dec 26, 2023Updated 2 years ago
- ☆21May 13, 2022Updated 3 years ago
- A throughput-oriented high-performance serving framework for LLMs☆945Oct 29, 2025Updated 3 months ago
- ☆20Jun 3, 2023Updated 2 years ago
- ☆84Dec 2, 2022Updated 3 years ago
- ☆12Jan 12, 2024Updated 2 years ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆458May 30, 2025Updated 8 months ago
- ☆323Jan 22, 2024Updated 2 years ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆314Jun 10, 2025Updated 8 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆135Feb 22, 2024Updated last year
- A large-scale simulation framework for LLM inference☆535Jul 25, 2025Updated 6 months ago
- TQT's pytorch implementation.☆21Dec 17, 2021Updated 4 years ago
- ☆13May 10, 2024Updated last year
- Distributed, Replicated, Protocol-generic Key-value Store in Async Rust For SMR Protocols Research☆17Feb 1, 2026Updated 2 weeks ago
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆15Sep 18, 2020Updated 5 years ago
- [ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation☆22May 29, 2025Updated 8 months ago
- ☆11Jul 9, 2023Updated 2 years ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆25Nov 21, 2024Updated last year
- High performance Transformer implementation in C++.☆151Jan 18, 2025Updated last year
- ☆44Sep 6, 2021Updated 4 years ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆56May 29, 2024Updated last year
- Arrow Matrix Decomposition - Communication-Efficient Distributed Sparse Matrix Multiplication☆15Mar 25, 2024Updated last year
- Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning" [NSDI '23]☆46Nov 24, 2022Updated 3 years ago
- ☆24May 9, 2025Updated 9 months ago
- 2nd place solution of ECCV 2020 workshop VIPriors Image Classification Challenge, https://arxiv.org/abs/2008.00261☆13Aug 22, 2021Updated 4 years ago
- ☆10Nov 14, 2023Updated 2 years ago