meta-pytorch/monarch

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/meta-pytorch/monarch)

meta-pytorch / monarch

PyTorch Single Controller

☆1,060

Alternatives and similar repositories for monarch

Users that are interested in monarch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

meta-pytorch / torchforge
View on GitHub
PyTorch-native post-training at scale
☆697Updated this week
meta-pytorch / torchft
View on GitHub
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆526Jul 16, 2026Updated last week
meta-pytorch / torchstore
View on GitHub
A storage solution for PyTorch tensors with distributed tensor support.
☆81Jul 17, 2026Updated last week
meta-pytorch / torchcomms
View on GitHub
torchcomms: a modern PyTorch communications API
☆380Updated this week
pytorch / helion
View on GitHub
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆914Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆980Jul 4, 2026Updated 2 weeks ago
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,070Updated this week
pytorch / torchtitan
View on GitHub
A PyTorch native platform for training generative AI models
☆5,556Updated this week
meta-pytorch / autoparallel
View on GitHub
An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.
☆89Updated this week
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
radixark / miles
View on GitHub
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
☆1,784Updated this week
ServiceNow / PipelineRL
View on GitHub
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆428Updated this week
NovaSky-AI / SkyRL
View on GitHub
SkyRL: A Modular Full-stack RL Library for LLMs
☆2,088Updated this week
pytorch / ao
View on GitHub
PyTorch native quantization and sparsity for training and inference
☆2,912Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,563Jul 13, 2026Updated last week
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 3 weeks ago
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,574Updated this week
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆593Nov 7, 2025Updated 8 months ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,345Aug 28, 2025Updated 10 months ago
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,621Updated this week
NVIDIA-NeMo / RL
View on GitHub
Scalable toolkit for efficient model reinforcement
☆1,848Updated this week
uccl-project / uccl
View on GitHub
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…
☆1,470Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆6,018Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
PrimeIntellect-ai / prime-rl
View on GitHub
Agentic RL Training at Scale
☆1,723Updated this week
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,149Updated this week
PrimeIntellect-ai / pccl
View on GitHub
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆157Sep 12, 2025Updated 10 months ago
linkedin / Liger-Kernel
View on GitHub
Efficient Triton Kernels for LLM Training
☆6,533Updated this week
ByteDance-Seed / VeOmni
View on GitHub
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
☆2,107Updated this week
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,764May 26, 2026Updated last month
meta-pytorch / torchx
View on GitHub
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…
☆427Updated this week
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,390Updated this week
vllm-project / vime
View on GitHub
An LLM post-training framework with vLLM for RL Scaling
☆383Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
NVIDIA / nvshmem
View on GitHub
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆563Updated this week
huggingface / picotron
View on GitHub
Minimalistic 4D-parallelism distributed training framework for education purpose
☆2,256Aug 26, 2025Updated 10 months ago
yifuwang / symm-mem-recipes
View on GitHub
☆170Dec 27, 2024Updated last year
volcengine / veScale
View on GitHub
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
☆1,033Mar 3, 2026Updated 4 months ago
stepfun-ai / StepMesh
View on GitHub
☆378Jan 28, 2026Updated 5 months ago
fzyzcjy / torch_memory_saver
View on GitHub
Allow torch tensor memory to be released and resumed later
☆260Updated this week
NVIDIA / nvidia-resiliency-ext
View on GitHub
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …
☆311Updated this week