OpenSQZ / MegatronAppLinks
Toolchain built around the Megatron-LM for Distributed Training
☆81Updated last month
Alternatives and similar repositories for MegatronApp
Users that are interested in MegatronApp are comparing it to the libraries listed below
Sorting:
- ☆96Updated 9 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆256Updated last month
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆77Updated 3 weeks ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆83Updated 2 weeks ago
- Allow torch tensor memory to be released and resumed later☆196Updated last month
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆250Updated 3 weeks ago
- A simple calculation for LLM MFU.☆58Updated 3 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆73Updated last week
- Nex Venus Communication Library☆68Updated last month
- ☆116Updated 7 months ago
- ☆337Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆135Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆79Updated last year
- Efficient Compute-Communication Overlap for Distributed LLM Inference☆67Updated 2 months ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆73Updated 3 months ago
- Pipeline Parallelism Emulation and Visualization☆74Updated 6 months ago
- Autonomous GPU Kernel Generation via Deep Agents☆197Updated last week
- Estimate MFU for DeepSeekV3☆26Updated last year
- ☆84Updated 8 months ago
- ☆65Updated 8 months ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆177Updated 2 weeks ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆91Updated this week
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆102Updated last week
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆220Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆82Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.☆178Updated 3 weeks ago
- ☆72Updated 3 months ago
- ☆153Updated 10 months ago
- DeeperGEMM: crazy optimized version☆74Updated 8 months ago
- ☆52Updated 7 months ago