inclusionAI / asystem-astateLinks
☆36Updated 2 months ago
Alternatives and similar repositories for asystem-astate
Users that are interested in asystem-astate are comparing it to the libraries listed below
Sorting:
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆90Updated last month
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆269Updated last week
- ☆342Updated 2 weeks ago
- Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.☆57Updated 5 months ago
- Building the Virtuous Cycle for AI-driven LLM Systems☆164Updated this week
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Updated 3 weeks ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆69Updated last year
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆87Updated 2 weeks ago
- ☆73Updated 4 months ago
- A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…☆131Updated last month
- A simple calculation for LLM MFU.☆66Updated 5 months ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆83Updated 5 months ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆191Updated last week
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆92Updated 3 weeks ago
- Allow torch tensor memory to be released and resumed later☆216Updated 3 weeks ago
- Pipeline Parallelism Emulation and Visualization☆77Updated last month
- Nex Venus Communication Library☆72Updated 2 months ago
- Distributed MoE in a Single Kernel [NeurIPS '25]☆191Updated this week
- Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.☆122Updated last month
- ☆77Updated last year
- ☆47Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆82Updated last year
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆66Updated 2 months ago
- Learning TileLang with 10 puzzles!☆118Updated last week
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…☆47Updated 4 months ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Updated last month
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆209Updated last year
- Toolchain built around the Megatron-LM for Distributed Training☆86Updated 2 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆87Updated 2 months ago
- ☆47Updated last year