OpenSQZ / MegatronAppLinks
Toolchain built around the Megatron-LM for Distributed Training
☆79Updated last week
Alternatives and similar repositories for MegatronApp
Users that are interested in MegatronApp are comparing it to the libraries listed below
Sorting:
- A simple calculation for LLM MFU.☆50Updated 3 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆256Updated last week
- ☆97Updated 8 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆73Updated this week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆247Updated this week
- Allow torch tensor memory to be released and resumed later☆187Updated 2 weeks ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆172Updated this week
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆91Updated 3 months ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆72Updated 3 months ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆58Updated 2 weeks ago
- Odysseus: Playground of LLM Sequence Parallelism☆78Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆81Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆219Updated last year
- ☆329Updated last month
- Pipeline Parallelism Emulation and Visualization☆72Updated 6 months ago
- ☆82Updated 7 months ago
- A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…☆109Updated last week
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Updated last year
- DeeperGEMM: crazy optimized version☆73Updated 7 months ago
- Autonomous GPU Kernel Generation via Deep Agents☆179Updated this week
- Fast and memory-efficient exact attention☆104Updated this week
- Nex Venus Communication Library☆64Updated 3 weeks ago
- PyTorch bindings for CUTLASS grouped GEMM.☆172Updated 2 months ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆539Updated this week
- ☆114Updated 6 months ago
- HuggingFace conversion and training library for Megatron-based models☆270Updated this week
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆56Updated last month
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆146Updated 2 months ago
- Estimate MFU for DeepSeekV3☆26Updated 11 months ago
- LLM Serving Performance Evaluation Harness☆82Updated 9 months ago