ailzhang / minPPLinks
Pipeline parallelism for the minimalist
☆37Updated 4 months ago
Alternatives and similar repositories for minPP
Users that are interested in minPP are comparing it to the libraries listed below
Sorting:
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆214Updated this week
- 👷 Build compute kernels☆195Updated this week
- Load compute kernels from the Hub☆352Updated last week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆61Updated last week
- LM engine is a library for pretraining/finetuning LLMs☆102Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆64Updated last week
- Ship correct and fast LLM kernels to PyTorch☆127Updated last week
- ring-attention experiments☆160Updated last year
- ☆219Updated 11 months ago
- Memory optimized Mixture of Experts☆72Updated 5 months ago
- extensible collectives library in triton☆91Updated 8 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆90Updated last year
- TPU inference for vLLM, with unified JAX and PyTorch support.☆202Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆462Updated this week
- Easy, Fast, and Scalable Multimodal AI☆81Updated last week
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆277Updated last month
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆174Updated last week
- ☆71Updated 9 months ago
- Fast low-bit matmul kernels in Triton☆413Updated last week
- ☆91Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆108Updated 2 months ago
- ☆115Updated 7 months ago
- Parallel framework for training and fine-tuning deep neural networks☆71Updated last month
- [WIP] Better (FP8) attention for Hopper☆32Updated 10 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆217Updated 2 weeks ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆73Updated this week
- A block oriented training approach for inference time optimization.☆34Updated last year
- Make triton easier☆49Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- Hand-Rolled GPU communications library☆76Updated last month