meta-pytorch/autoparallel

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/meta-pytorch/autoparallel)

meta-pytorch / autoparallel

An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.

☆89

Alternatives and similar repositories for autoparallel

Users that are interested in autoparallel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

meta-pytorch / spmd_types
View on GitHub
This module defines a type system for distributed training code, based off of JAX's sharding in types, but adapted for the PyTorch ecosys…
☆34Updated this week
meta-pytorch / remat
View on GitHub
torch_remat fine-grained activation checkpointing API
☆15Updated this week
meta-pytorch / tlparse
View on GitHub
TORCH_TRACE parser for PT2
☆90May 11, 2026Updated 2 months ago
meta-pytorch / torchcomms
View on GitHub
torchcomms: a modern PyTorch communications API
☆377Updated this week
meta-pytorch / monarch
View on GitHub
PyTorch Single Controller
☆1,060Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
meta-pytorch / torchstore
View on GitHub
A storage solution for PyTorch tensors with distributed tensor support.
☆80Updated this week
openxla / shardy
View on GitHub
MLIR-based partitioning system
☆198Updated this week
pytorch / helion
View on GitHub
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆910Updated this week
meta-pytorch / kraken
View on GitHub
Triton-based Symmetric Memory operators and examples
☆106May 15, 2026Updated 2 months ago
meta-pytorch / torchft
View on GitHub
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆523Updated this week
meta-pytorch / MSLK
View on GitHub
MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…
☆121Updated this week
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,494Updated this week
mlc-ai / pith-train
View on GitHub
Compact and Agent-Native MoE Training System
☆290Updated this week
deepseek-ai / LPLB
View on GitHub
An early research stage expert-parallel load balancer for MoE models based on linear programming.
☆516Nov 19, 2025Updated 8 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆251Jun 21, 2026Updated 3 weeks ago
meta-pytorch / torchx
View on GitHub
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…
☆427Updated this week
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
albanD / subclass_zoo
View on GitHub
☆192Jun 16, 2024Updated 2 years ago
NVIDIA / jax-tvm-ffi
View on GitHub
JAX support for tvm-ffi abi
☆26May 14, 2026Updated 2 months ago
togethercomputer / ParallelKernelBench
View on GitHub
☆41Jul 1, 2026Updated 2 weeks ago
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 2 weeks ago
meta-pytorch / torchsnapshot
View on GitHub
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆165Jun 10, 2026Updated last month
meta-pytorch / BackendBench
View on GitHub
Ship correct and fast LLM kernels to PyTorch
☆151Jan 14, 2026Updated 6 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,063Updated this week
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆361Updated this week
ezyang / cute-interactive
View on GitHub
Interactive version of the CuTe layout paper
☆57Apr 14, 2026Updated 3 months ago
Dao-AILab / gram-newton-schulz
View on GitHub
Fast Polar Decomposition for Muon
☆166Jul 2, 2026Updated 2 weeks ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,344Aug 28, 2025Updated 10 months ago
fanshiqing / grouped_gemm
View on GitHub
PyTorch bindings for CUTLASS grouped GEMM.
☆191Apr 8, 2026Updated 3 months ago
yifuwang / symm-mem-recipes
View on GitHub
☆170Dec 27, 2024Updated last year
cchan / tccl
View on GitHub
extensible collectives library in triton
☆97Mar 31, 2025Updated last year
Farseer-Scaling-Law / Farseer
View on GitHub
☆21Jun 12, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
NVIDIA / nvshmem
View on GitHub
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆560Updated this week
AI-Hypercomputer / accelerator-agents
View on GitHub
☆46Updated this week
thuml / depyf
View on GitHub
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
☆815Oct 13, 2025Updated 9 months ago
ADAPT-uiuc / TensorRight
View on GitHub
TensorRight: Automated Verification of Tensor Graph Rewrites
☆24Nov 9, 2025Updated 8 months ago
volcengine / veScale
View on GitHub
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
☆1,031Mar 3, 2026Updated 4 months ago
inclusionAI / asystem-amem
View on GitHub
A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.
☆110Dec 17, 2025Updated 7 months ago
facebookexperimental / triton
View on GitHub
Github mirror of trition-lang/triton repo.
☆178Updated this week