ailzhang / minPPLinks
Pipeline parallelism for the minimalist
☆35Updated 2 months ago
Alternatives and similar repositories for minPP
Users that are interested in minPP are comparing it to the libraries listed below
Sorting:
- [WIP] Better (FP8) attention for Hopper☆33Updated 8 months ago
- How to ensure correctness and ship LLM generated kernels in PyTorch☆111Updated this week
- Pytorch DTensor native training library for LLMs/VLMs with OOTB Hugging Face support☆141Updated this week
- torchcomms: a modern PyTorch communications API☆219Updated this week
- 👷 Build compute kernels☆163Updated last week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆216Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆58Updated 3 weeks ago
- ring-attention experiments☆155Updated last year
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆55Updated 3 weeks ago
- This repository contains the experimental PyTorch native float8 training UX☆223Updated last year
- extensible collectives library in triton☆90Updated 7 months ago
- ☆103Updated last week
- A collection of reproducible inference engine benchmarks☆37Updated 6 months ago
- Fast low-bit matmul kernels in Triton☆388Updated last week
- TPU inference for vLLM, with unified JAX and PyTorch support.☆138Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆441Updated last week
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆195Updated last week
- Load compute kernels from the Hub☆308Updated last week
- ☆218Updated 9 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆85Updated last year
- Make triton easier☆48Updated last year
- Parallel framework for training and fine-tuning deep neural networks☆65Updated last week
- ☆71Updated 7 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 7 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆233Updated 5 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆147Updated 2 years ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆183Updated this week
- ☆174Updated last year
- ☆89Updated last year
- A block oriented training approach for inference time optimization.☆33Updated last year