PluralisResearch / AsyncPPLinks
Asynchronous pipeline parallel optimization
☆19Updated last week
Alternatives and similar repositories for AsyncPP
Users that are interested in AsyncPP are comparing it to the libraries listed below
Sorting:
- ☆46Updated 10 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆87Updated last week
- AI model training on heterogeneous, geo-distributed resources☆34Updated 2 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆128Updated 7 months ago
- Easy, Fast, and Scalable Multimodal AI☆109Updated this week
- AI-Driven Research Systems (ADRS)☆119Updated last month
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆116Updated 3 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆63Updated 3 months ago
- ring-attention experiments☆165Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆141Updated last year
- Fast and memory-efficient exact attention☆18Updated 2 weeks ago
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…☆47Updated 3 months ago
- Debug print operator for cudagraph debugging☆14Updated last year
- Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.☆19Updated last year
- ☆38Updated 6 months ago
- Transformers components but in Triton☆34Updated 9 months ago
- DeeperGEMM: crazy optimized version☆73Updated 9 months ago
- Expert Specialization MoE Solution based on CUTLASS☆27Updated 3 weeks ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆55Updated last year
- Distributed MoE in a Single Kernel [NeurIPS '25]☆191Updated this week
- Triton-based Symmetric Memory operators and examples☆81Updated 3 weeks ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆176Updated last year
- ☆52Updated 8 months ago
- A minimal implementation of vllm.☆66Updated last year
- Parallel framework for training and fine-tuning deep neural networks☆70Updated 3 months ago
- ☆270Updated 8 months ago
- ☆47Updated last year
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆123Updated 7 months ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆48Updated last year
- A bunch of kernels that might make stuff slower 😉☆75Updated this week