ISEEKYAN/mbridge

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ISEEKYAN/mbridge)

ISEEKYAN / mbridge

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

☆226

Alternatives and similar repositories for mbridge

Users that are interested in mbridge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ISEEKYAN / verl_megatron_practice
View on GitHub
(best/better) practices of megatron on veRL and tuning guide
☆137May 12, 2026Updated 2 months ago
NVIDIA-NeMo / Megatron-Bridge
View on GitHub
Training library for Megatron-based models with bidirectional Hugging Face conversion capability
☆829Updated this week
fzyzcjy / torch_memory_saver
View on GitHub
Allow torch tensor memory to be released and resumed later
☆261Updated this week
TransferQueue / TransferQueue
View on GitHub
[Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…
☆16Jan 16, 2026Updated 6 months ago
yanring / Megatron-MoE-ModelZoo
View on GitHub
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆201May 29, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
fanshiqing / grouped_gemm
View on GitHub
PyTorch bindings for CUTLASS grouped GEMM.
☆192Apr 8, 2026Updated 3 months ago
alibaba / ChatLearn
View on GitHub
A flexible and efficient training framework for large-scale alignment tasks
☆452Oct 23, 2025Updated 9 months ago
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,629Updated this week
inclusionAI / asystem-amem
View on GitHub
A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.
☆113Dec 17, 2025Updated 7 months ago
alibaba / Pai-Megatron-Patch
View on GitHub
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
☆1,585Dec 15, 2025Updated 7 months ago
radixark / miles
View on GitHub
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
☆1,789Updated this week
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆982Jul 4, 2026Updated 3 weeks ago
ByteDance-Seed / VeOmni
View on GitHub
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
☆2,106Updated this week
zhuzilin / ring-flash-attention
View on GitHub
Ring attention implementation with flash attention
☆1,037Sep 10, 2025Updated 10 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Victarry / PP-Schedule-Visualization
View on GitHub
Pipeline Parallelism Emulation and Visualization
☆85Jun 30, 2026Updated 3 weeks ago
fzyzcjy / torch_utils
View on GitHub
Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…
☆114Sep 11, 2025Updated 10 months ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,448Updated this week
areal-project / AReaL
View on GitHub
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
☆5,599Updated this week
redai-infra / Relax
View on GitHub
An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
☆538Updated this week
OpenSQZ / MegatronApp
View on GitHub
Toolchain built around the Megatron-LM for Distributed Training
☆97May 20, 2026Updated 2 months ago
feifeibear / long-context-attention
View on GitHub
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
☆682May 21, 2026Updated 2 months ago
SandAI-org / MagiAttention
View on GitHub
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆888Updated this week
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 3 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Ascend / TransferQueue
View on GitHub
An asynchronous streaming data management module for efficient post-training.
☆119Jul 12, 2026Updated 2 weeks ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,345Aug 28, 2025Updated 10 months ago
RLsys-Foundation / TritonForge
View on GitHub
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆146Nov 10, 2025Updated 8 months ago
NVIDIA-NeMo / Automodel
View on GitHub
🚀 Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆768Updated this week
verl-project / verl-recipe
View on GitHub
A set of examples based on verl for end-to-end RL training recipes.
☆311Updated this week
vllm-project / vime
View on GitHub
An LLM post-training framework with vLLM for RL Scaling
☆385Updated this week
alibaba / ROLL
View on GitHub
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
☆3,325Updated this week
NVIDIA-NeMo / RL
View on GitHub
Scalable toolkit for efficient model reinforcement
☆1,849Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sgl-project / SpecForge
View on GitHub
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆1,009Updated this week
sail-sg / zero-bubble-pipeline-parallelism
View on GitHub
Zero Bubble Pipeline Parallelism
☆464May 7, 2025Updated last year
volcengine / veScale
View on GitHub
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
☆1,033Mar 3, 2026Updated 4 months ago
nex-agi / NexRL
View on GitHub
NexRL is an ultra-loosely-coupled LLM post-training framework.
☆114Updated this week
NVIDIA-NeMo / labs-molt
View on GitHub
☆579Updated this week
zhaochenyang20 / Awesome-ML-SYS-Tutorial
View on GitHub
My learning notes for ML SYS.
☆6,772Updated this week
kwai / Megatron-Kwai
View on GitHub
LLM training technologies developed by kwai
☆71Jun 30, 2026Updated 3 weeks ago