OpenSQZ / MegatronAppLinks
Toolchain built around the Megatron-LM for Distributed Training
☆76Updated this week
Alternatives and similar repositories for MegatronApp
Users that are interested in MegatronApp are comparing it to the libraries listed below
Sorting:
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆252Updated 4 months ago
- A simple calculation for LLM MFU.☆50Updated 2 months ago
- ☆97Updated 7 months ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆159Updated last week
- Odysseus: Playground of LLM Sequence Parallelism☆78Updated last year
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆81Updated 2 months ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆68Updated 2 months ago
- Allow torch tensor memory to be released and resumed later☆175Updated last week
- Training library for Megatron-based models☆209Updated this week
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆218Updated last year
- Estimate MFU for DeepSeekV3☆26Updated 10 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆68Updated 2 weeks ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆52Updated 3 weeks ago
- ☆320Updated last week
- PyTorch bindings for CUTLASS grouped GEMM.☆167Updated last month
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆275Updated 2 weeks ago
- ☆81Updated 7 months ago
- Pipeline Parallelism Emulation and Visualization☆70Updated 5 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆232Updated this week
- ☆67Updated 2 months ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆128Updated 2 weeks ago
- PyTorch bindings for CUTLASS grouped GEMM.☆130Updated 5 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆80Updated last year
- Triton-based Symmetric Memory operators and examples☆63Updated last month
- ☆132Updated 5 months ago
- Sequence-level 1F1B schedule for LLMs.☆18Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆64Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆131Updated 11 months ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆52Updated last year
- ☆109Updated 6 months ago