An early research stage expert-parallel load balancer for MoE models based on linear programming.
☆499Nov 19, 2025Updated 3 months ago
Alternatives and similar repositories for LPLB
Users that are interested in LPLB are comparing it to the libraries listed below
Sorting:
- ☆32Jul 2, 2025Updated 8 months ago
- ☆88May 31, 2025Updated 9 months ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- Website for CSE 234, Winter 2025☆13Mar 24, 2025Updated 11 months ago
- ☆15Feb 24, 2026Updated last week
- Cute layout visualization☆30Jan 18, 2026Updated last month
- ☆53Feb 24, 2026Updated last week
- Distributed Compiler based on Triton for Parallel Systems☆1,371Feb 13, 2026Updated 2 weeks ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated last month
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆129Jun 24, 2025Updated 8 months ago
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆446Feb 4, 2026Updated last month
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆469Updated this week
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆163Feb 11, 2026Updated 3 weeks ago
- DeeperGEMM: crazy optimized version☆74May 5, 2025Updated 9 months ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,261Aug 28, 2025Updated 6 months ago
- Official Project Page for HLA: Higher-order Linear Attention (https://arxiv.org/abs/2510.27258)☆45Jan 6, 2026Updated last month
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 3 months ago
- ☆52May 19, 2025Updated 9 months ago
- ☆347Jan 28, 2026Updated last month
- Expert Parallelism Load Balancer☆1,351Mar 24, 2025Updated 11 months ago
- ☆95Apr 2, 2025Updated 11 months ago
- ☆87Updated this week
- An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.☆52Feb 25, 2026Updated last week
- ☆38Aug 7, 2025Updated 6 months ago
- ☆49Feb 5, 2026Updated 3 weeks ago
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆29Feb 3, 2026Updated last month
- Perplexity GPU Kernels☆567Nov 7, 2025Updated 3 months ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆17Nov 19, 2025Updated 3 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆223Nov 6, 2025Updated 3 months ago
- ☆118May 19, 2025Updated 9 months ago
- Analyze computation-communication overlap in V3/R1.☆1,143Mar 21, 2025Updated 11 months ago
- Canvas: End-to-End Kernel Architecture Search in Neural Networks☆27Nov 18, 2024Updated last year
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- ☆10Jun 28, 2025Updated 8 months ago
- Pytorch routines for (Ker)nel (Mac)hines☆10Oct 10, 2025Updated 4 months ago
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.☆2,926Jan 14, 2026Updated last month
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.☆54Feb 6, 2026Updated 3 weeks ago
- Compiler for Dynamic Neural Networks☆45Nov 13, 2023Updated 2 years ago