ailzhang / minPPLinks
Pipeline parallelism for the minimalist
☆38Updated 5 months ago
Alternatives and similar repositories for minPP
Users that are interested in minPP are comparing it to the libraries listed below
Sorting:
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆285Updated 2 months ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆63Updated last week
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆259Updated this week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆205Updated last week
- TPU inference for vLLM, with unified JAX and PyTorch support.☆223Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆68Updated last week
- Ship correct and fast LLM kernels to PyTorch☆139Updated 2 weeks ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 10 months ago
- Our first fully AI generated deep learning system☆429Updated last week
- LM engine is a library for pretraining/finetuning LLMs☆113Updated this week
- [WIP] Better (FP8) attention for Hopper☆32Updated 11 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆219Updated this week
- A block oriented training approach for inference time optimization.☆34Updated last year
- ☆219Updated last year
- Fast low-bit matmul kernels in Triton☆424Updated this week
- Explore training for quantized models☆26Updated 6 months ago
- Easy, Fast, and Scalable Multimodal AI☆106Updated this week
- ☆117Updated 3 weeks ago
- extensible collectives library in triton☆93Updated 10 months ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆474Updated 2 weeks ago
- 👷 Build compute kernels☆214Updated last week
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆114Updated this week
- ☆178Updated last year
- Load compute kernels from the Hub☆381Updated last week
- ☆277Updated this week
- torchcomms: a modern PyTorch communications API☆323Updated this week
- A collection of reproducible inference engine benchmarks☆38Updated 9 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆250Updated 8 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆80Updated last month