meta-pytorch / monarchLinks
PyTorch Single Controller
☆423Updated this week
Alternatives and similar repositories for monarch
Users that are interested in monarch are comparing it to the libraries listed below
Sorting:
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆404Updated last week
- Load compute kernels from the Hub☆287Updated this week
- ☆220Updated 7 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆276Updated last month
- ☆725Updated 2 weeks ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆573Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆194Updated 3 months ago
- Scalable and Performant Data Loading☆304Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆142Updated last year
- Dion optimizer algorithm☆347Updated this week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆268Updated 2 months ago
- LLM KV cache compression made easy☆613Updated this week
- 👷 Build compute kernels☆147Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆576Updated last month
- A Quirky Assortment of CuTe Kernels☆582Updated this week
- ☆224Updated 3 months ago
- ☆237Updated last week
- ring-attention experiments☆152Updated 11 months ago
- Applied AI experiments and examples for PyTorch☆296Updated last month
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆623Updated last week
- Fast low-bit matmul kernels in Triton☆371Updated last week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆307Updated last week
- Simple MPI implementation for prototyping or learning☆279Updated last month
- ☆535Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆212Updated last week
- ☆217Updated 8 months ago
- Learn CUDA with PyTorch☆84Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆57Updated this week
- Simple & Scalable Pretraining for Neural Architecture Research☆294Updated last month
- Cataloging released Triton kernels.☆260Updated 2 weeks ago