NVIDIA-NeMo / Megatron-BridgeLinks

Training library for Megatron-based models

☆193

Alternatives and similar repositories for Megatron-Bridge

Users that are interested in Megatron-Bridge are comparing it to the libraries listed below

Sorting:

RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆217Updated last year
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆179Updated this week
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆269Updated 2 weeks ago
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆252Updated 4 months ago
NVIDIA / Megatron-Energon
Megatron's multi-modal data loader
☆272Updated this week
ISEEKYAN / mbridge
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
☆159Updated last week
radixark / miles
☆199Updated this week
yanring / Megatron-MoE-ModelZoo
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆128Updated last week
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆272Updated 2 weeks ago
agentica-project / verl-pipeline
Async pipelined version of Verl
☆125Updated 7 months ago
Dao-AILab / grouped-latent-attention
☆130Updated 5 months ago
FasterDecoding / REST
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆210Updated 2 months ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆131Updated 11 months ago
openpsi-project / ReaLHF
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
☆323Updated 6 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆248Updated 5 months ago
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆248Updated last month
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆249Updated 3 months ago
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆353Updated 4 months ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆167Updated last month
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆308Updated last week
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆154Updated last month
FasterDecoding / SnapKV
☆290Updated 4 months ago
zhuzilin / ring-flash-attention
Ring attention implementation with flash attention
☆910Updated 2 months ago
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆206Updated 5 months ago
mit-han-lab / duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆501Updated 9 months ago
RLsys-Foundation / TritonForge
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆99Updated last week
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆194Updated last month
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆112Updated 8 months ago
rlite-project / RLite
A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…
☆81Updated 2 months ago