ailzhang / minPPLinks

Pipeline parallelism for the minimalist

☆37

Alternatives and similar repositories for minPP

Users that are interested in minPP are comparing it to the libraries listed below

Sorting:

gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated last week
NVIDIA / jaxpp
JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
☆57Updated last week
meta-pytorch / BackendBench
Ship correct and fast LLM kernels to PyTorch
☆124Updated 2 weeks ago
vdesai2014 / inference-optimization-blog-post
☆90Updated last year
vllm-project / tpu-inference
TPU inference for vLLM, with unified JAX and PyTorch support.
☆170Updated this week
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆401Updated last week
deepspeedai / DeepSpeed-Kernels
☆71Updated 8 months ago
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
cchan / tccl
extensible collectives library in triton
☆91Updated 8 months ago
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆187Updated this week
huggingface / kernel-builder
👷 Build compute kernels
☆186Updated this week
Deep-Learning-Profiling-Tools / triton-viz
☆256Updated this week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆132Updated last week
huggingface / kernels
Load compute kernels from the Hub
☆335Updated this week
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆86Updated last year
UmerHA / triton_util
Make triton easier
☆49Updated last year
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆321Updated this week
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆295Updated this week
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 8 months ago
WaveSpeedAI / QuantumAttention
[WIP] Better (FP8) attention for Hopper
☆32Updated 9 months ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆244Updated 6 months ago
cornserve-ai / cornserve
Easy, Fast, and Scalable Multimodal AI
☆73Updated last week
microsoft / AttentionEngine
☆111Updated 6 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 5 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆217Updated this week
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆37Updated 7 months ago
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆218Updated this week
vedantroy / gpu_kernels
☆27Updated last year
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆175Updated last week
gpu-mode / profiling-cuda-in-torch
☆177Updated last year