ailzhang / minPPLinks
Pipeline parallelism for the minimalist
☆40Updated 6 months ago
Alternatives and similar repositories for minPP
Users that are interested in minPP are comparing it to the libraries listed below
Sorting:
- Write a fast kernel and run it on Discord. See how you compare against the best!☆71Updated this week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆64Updated 3 weeks ago
- Our first fully AI generated deep learning system☆481Updated last week
- Ship correct and fast LLM kernels to PyTorch☆141Updated 3 weeks ago
- Make triton easier☆50Updated last year
- extensible collectives library in triton☆95Updated 10 months ago
- A bunch of kernels that might make stuff slower 😉☆75Updated last week
- A block oriented training approach for inference time optimization.☆34Updated last year
- ring-attention experiments☆165Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- [WIP] Better (FP8) attention for Hopper☆32Updated 11 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 10 months ago
- This repository contains the experimental PyTorch native float8 training UX☆226Updated last year
- Explore training for quantized models☆26Updated 7 months ago
- A collection of reproducible inference engine benchmarks☆38Updated 9 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆79Updated last month
- ☆28Updated last year
- ☆288Updated this week
- ☆71Updated 10 months ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- Quantized LLM training in pure CUDA/C++.☆238Updated 3 weeks ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆93Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆115Updated last year
- Fast and Furious AMD Kernels☆350Updated 2 weeks ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆194Updated last week
- Fast low-bit matmul kernels in Triton☆429Updated last week
- PyTorch centric eager mode debugger☆48Updated last year
- ☆177Updated 2 years ago
- ☆15Updated 3 months ago