ailzhang / minPPLinks
Pipeline parallelism for the minimalist
☆37Updated 3 months ago
Alternatives and similar repositories for minPP
Users that are interested in minPP are comparing it to the libraries listed below
Sorting:
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated last week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆57Updated last week
- Ship correct and fast LLM kernels to PyTorch☆124Updated 2 weeks ago
- ☆90Updated last year
- TPU inference for vLLM, with unified JAX and PyTorch support.☆170Updated this week
- Fast low-bit matmul kernels in Triton☆401Updated last week
- ☆71Updated 8 months ago
- ring-attention experiments☆160Updated last year
- extensible collectives library in triton☆91Updated 8 months ago
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆187Updated this week
- 👷 Build compute kernels☆186Updated this week
- ☆256Updated this week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆132Updated last week
- Load compute kernels from the Hub☆335Updated this week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆86Updated last year
- Make triton easier☆49Updated last year
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆321Updated this week
- torchcomms: a modern PyTorch communications API☆295Updated this week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 8 months ago
- [WIP] Better (FP8) attention for Hopper☆32Updated 9 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆244Updated 6 months ago
- Easy, Fast, and Scalable Multimodal AI☆73Updated last week
- ☆111Updated 6 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆196Updated 5 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆217Updated this week
- A collection of reproducible inference engine benchmarks☆37Updated 7 months ago
- Quantized LLM training in pure CUDA/C++.☆218Updated this week
- ☆27Updated last year
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆175Updated last week
- ☆177Updated last year