deepseek-ai/EPLB

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/deepseek-ai/EPLB)

deepseek-ai / EPLB

Expert Parallelism Load Balancer

☆1,407

Alternatives and similar repositories for EPLB

Users that are interested in EPLB are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

deepseek-ai / profile-data
View on GitHub
Analyze computation-communication overlap in V3/R1.
☆1,176Mar 21, 2025Updated last year
deepseek-ai / DualPipe
View on GitHub
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
☆2,983Jan 14, 2026Updated 6 months ago
deepseek-ai / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient BLAS kernel library on GPU
☆7,558Updated this week
deepseek-ai / DeepEP
View on GitHub
DeepEP: an efficient expert-parallel communication library
☆9,888Jul 14, 2026Updated last week
deepseek-ai / FlashMLA
View on GitHub
FlashMLA: Efficient Multi-head Latent Attention Kernels
☆12,776Apr 30, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
deepseek-ai / 3FS
View on GitHub
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
☆10,077May 7, 2026Updated 2 months ago
deepseek-ai / smallpond
View on GitHub
A lightweight data processing framework built on DuckDB and 3FS.
☆4,971Mar 5, 2025Updated last year
deepseek-ai / open-infra-index
View on GitHub
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
☆8,032May 15, 2025Updated last year
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,345Aug 28, 2025Updated 10 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆6,032Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,999Updated this week
stepfun-ai / StepMesh
View on GitHub
☆379Jan 28, 2026Updated 5 months ago
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆593Nov 7, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 3 weeks ago
deepseek-ai / LPLB
View on GitHub
An early research stage expert-parallel load balancer for MoE models based on linear programming.
☆516Nov 19, 2025Updated 8 months ago
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,151Updated this week
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,448Updated this week
MoonshotAI / Moonlight
View on GitHub
Muon is Scalable for LLM Training
☆1,513Aug 3, 2025Updated 11 months ago
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,908Updated this week
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,580Updated this week
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,733Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,125Updated this week
MoonshotAI / MoBA
View on GitHub
MoBA: Mixture of Block Attention for Long-Context LLMs
☆2,153Apr 3, 2025Updated last year
SandAI-org / MagiAttention
View on GitHub
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆888Updated this week
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆982Jul 4, 2026Updated 3 weeks ago
sail-sg / zero-bubble-pipeline-parallelism
View on GitHub
Zero Bubble Pipeline Parallelism
☆464May 7, 2025Updated last year
fla-org / native-sparse-attention
View on GitHub
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
☆1,014Feb 5, 2026Updated 5 months ago
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,212Updated this week
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
deepseek-ai / DeepSeek-MoE
View on GitHub
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
☆1,953Jan 16, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,629Updated this week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,070Updated this week
bytedance / InfiniStore
View on GitHub
KV cache store for distributed LLM inference
☆425Nov 13, 2025Updated 8 months ago
NVIDIA / TensorRT-LLM
View on GitHub
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…
☆14,205Updated this week
sgl-project / sgl-learning-materials
View on GitHub
Materials for learning SGLang
☆861Jan 5, 2026Updated 6 months ago
LLMServe / DistServe
View on GitHub
Disaggregated serving system for Large Language Models (LLMs).
☆826Apr 6, 2025Updated last year
alibaba / rtp-llm
View on GitHub
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆1,285Updated this week