MLSysOps / InfraGymLinks
Empowering LLM Agents for Real-World Computer System Optimization
☆15Updated 4 months ago
Alternatives and similar repositories for InfraGym
Users that are interested in InfraGym are comparing it to the libraries listed below
Sorting:
- ☆49Updated 8 months ago
- ☆84Updated 8 months ago
- LLM Serving Performance Evaluation Harness☆82Updated 10 months ago
- Stateful LLM Serving☆93Updated 9 months ago
- The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221☆27Updated 8 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆66Updated last year
- A simple calculation for LLM MFU.☆58Updated 3 months ago
- ☆72Updated 3 months ago
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Updated 2 years ago
- ☆96Updated 9 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆73Updated last week
- ☆39Updated 5 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆206Updated last year
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆47Updated last year
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…☆45Updated 2 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆91Updated this week
- Efficient Compute-Communication Overlap for Distributed LLM Inference☆67Updated 2 months ago
- Fast and memory-efficient exact attention☆15Updated this week
- ☆102Updated last year
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆102Updated last week
- ☆52Updated 7 months ago
- ☆81Updated 2 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆256Updated last month
- Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]☆48Updated 10 months ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆144Updated 2 weeks ago
- ☆69Updated this week
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆76Updated 3 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆151Updated 3 months ago
- ☆45Updated last year
- ☆65Updated 8 months ago