OpenSQZ / MegatronAppLinks
Toolchain built around the Megatron-LM for Distributed Training
☆84Updated last month
Alternatives and similar repositories for MegatronApp
Users that are interested in MegatronApp are comparing it to the libraries listed below
Sorting:
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆264Updated last month
- A simple calculation for LLM MFU.☆66Updated 4 months ago
- Allow torch tensor memory to be released and resumed later☆207Updated 2 weeks ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆87Updated last month
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆81Updated 4 months ago
- ☆96Updated 10 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆90Updated 2 weeks ago
- Pipeline Parallelism Emulation and Visualization☆76Updated 3 weeks ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆188Updated this week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆252Updated last week
- Estimate MFU for DeepSeekV3☆26Updated last year
- Nex Venus Communication Library☆72Updated 2 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆185Updated last month
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆80Updated last month
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Updated 2 weeks ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆109Updated last month
- ☆340Updated 3 weeks ago
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆98Updated 5 months ago
- DeeperGEMM: crazy optimized version☆73Updated 8 months ago
- Autonomous GPU Kernel Generation via Deep Agents☆223Updated this week
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆221Updated last year
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆156Updated last week
- Odysseus: Playground of LLM Sequence Parallelism☆79Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆82Updated last year
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Updated this week
- ☆65Updated 9 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆62Updated 2 months ago
- nnScaler: Compiling DNN models for Parallel Training☆124Updated 4 months ago
- Accelerating MoE with IO and Tile-aware Optimizations☆563Updated last week
- PyTorch bindings for CUTLASS grouped GEMM.☆141Updated 8 months ago