pytorch-labs / monarchLinks
PyTorch Single Controller
β231Updated this week
Alternatives and similar repositories for monarch
Users that are interested in monarch are comparing it to the libraries listed below
Sorting:
- PyTorch per step fault tolerance (actively under development)β329Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β253Updated this week
- β193Updated 4 months ago
- Scalable and Performant Data Loadingβ278Updated this week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understandβ185Updated 3 weeks ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β204Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ189Updated last month
- Collection of kernels written in Triton languageβ132Updated 2 months ago
- Load compute kernels from the Hubβ191Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β134Updated last year
- Cataloging released Triton kernels.β238Updated 5 months ago
- Fast low-bit matmul kernels in Tritonβ322Updated last week
- β219Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β556Updated last week
- Scalable toolkit for efficient model reinforcementβ448Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ224Updated 10 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β46Updated this week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ425Updated 3 weeks ago
- ring-attention experimentsβ144Updated 8 months ago
- Applied AI experiments and examples for PyTorchβ277Updated 3 weeks ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β166Updated this week
- extensible collectives library in tritonβ86Updated 2 months ago
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUsβ399Updated 2 weeks ago
- LLM KV cache compression made easyβ520Updated this week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β157Updated this week
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β130Updated this week
- TorchFix - a linter for PyTorch-using code with autofix supportβ143Updated 4 months ago
- β159Updated last year
- seqax = sequence modeling + JAXβ162Updated last week
- β270Updated 11 months ago