NVIDIA-NeMo / Megatron-BridgeLinks
Training library for Megatron-based models
☆142Updated this week
Alternatives and similar repositories for Megatron-Bridge
Users that are interested in Megatron-Bridge are comparing it to the libraries listed below
Sorting:
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆216Updated last year
- Megatron's multi-modal data loader☆256Updated last week
- Pytorch DTensor native training library for LLMs/VLMs with OOTB Hugging Face support☆141Updated this week
- ☆130Updated 5 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆78Updated last year
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆261Updated last month
- Triton-based implementation of Sparse Mixture of Experts.☆246Updated 3 weeks ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆114Updated 2 weeks ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆270Updated 3 months ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆143Updated this week
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆243Updated 2 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆249Updated 3 months ago
- Async pipelined version of Verl☆123Updated 6 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆142Updated last year
- 🔥 A minimal training framework for scaling FLA models☆273Updated last month
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆52Updated 3 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆148Updated 2 weeks ago
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆68Updated 2 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆197Updated 4 months ago
- ☆145Updated 8 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆191Updated 3 weeks ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆130Updated 10 months ago
- ☆285Updated 3 months ago
- ☆112Updated last year
- Efficient triton implementation of Native Sparse Attention.☆241Updated 5 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆125Updated 5 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆161Updated 3 weeks ago
- The evaluation framework for training-free sparse attention in LLMs☆102Updated 2 weeks ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆169Updated last year
- (best/better) practices of megatron on veRL and tuning guide☆98Updated last month