pytorch-labs / monarchLinks

PyTorch Single Controller

☆345

Alternatives and similar repositories for monarch

Users that are interested in monarch are comparing it to the libraries listed below

Sorting:

pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆372Updated this week
huggingface / picotron_tutorial
☆206Updated 5 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆188Updated 2 months ago
jax-ml / scaling-book
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
☆445Updated last week
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆258Updated last week
ScalingIntelligence / KernelBench
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
☆505Updated this week
facebookresearch / spdl
Scalable and Performant Data Loading
☆291Updated this week
rwitten / HighPerfLLMs2024
☆518Updated last year
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆565Updated this week
microsoft / dion
Dion optimizer algorithm
☆193Updated this week
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆137Updated last year
huggingface / kernels
Load compute kernels from the Hub
☆220Updated this week
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆48Updated this week
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated last week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆206Updated last week
NVIDIA / kvpress
LLM KV cache compression made easy
☆566Updated this week
pyember / ember
☆209Updated last month
gpu-mode / ring-attention
ring-attention experiments
☆146Updated 9 months ago
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
gpu-mode / profiling-cuda-in-torch
☆162Updated last year
huggingface / gpu-fryer
Where GPUs get cooked 👩‍🍳🔥
☆266Updated this week
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆338Updated last week
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆205Updated 3 months ago
NVIDIA-NeMo / RL
Scalable toolkit for efficient model reinforcement
☆578Updated this week
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆388Updated this week
google / aqt
☆323Updated this week
gpu-mode / triton-index
Cataloging released Triton kernels.
☆247Updated 6 months ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆165Updated 2 weeks ago
LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆463Updated 5 months ago
marin-community / marin
☆347Updated this week