deepseek-ai / LPLBLinks

An early research stage MoE load balancer based on inear programming.

☆228

Alternatives and similar repositories for LPLB

Users that are interested in LPLB are comparing it to the libraries listed below

Sorting:

tile-ai / TileRT
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆178Updated this week
NVIDIA / nvshmem
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆385Updated last week
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆291Updated this week
sgl-project / sglang-jax
JAX backend for SGL
☆175Updated this week
perplexityai / pplx-garden
Perplexity open source garden for inference technology
☆232Updated this week
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆252Updated 4 months ago
meta-pytorch / BackendBench
How to ensure correctness and ship LLM generated kernels in PyTorch
☆121Updated last week
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆529Updated 2 weeks ago
PKU-SEC-Lab / HybriMoE
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆89Updated 5 months ago
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆171Updated last week
thunlp / TritonBench
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
☆95Updated 5 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆300Updated this week
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆154Updated last month
microsoft / AttentionEngine
☆109Updated 6 months ago
stepfun-ai / StepMesh
☆316Updated last week
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆230Updated last week
EfficientMoE / MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆259Updated last month
hao-ai-lab / MuxServe
☆79Updated last month
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆286Updated this week
ademeure / DeeperGEMM
DeeperGEMM: crazy optimized version
☆73Updated 6 months ago
hao-ai-lab / LookaheadReasoning
[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning
☆52Updated 3 weeks ago
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆143Updated 9 months ago
fzyzcjy / torch_memory_saver
Allow torch tensor memory to be released and resumed later
☆167Updated last week
triton-lang / kernels
☆93Updated last year
meta-pytorch / KernelAgent
Autonomous GPU Kernel Generation via Deep Agents
☆137Updated this week
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆196Updated last year
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆187Updated last month
flexflow / flexflow-serve
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
☆63Updated 2 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆167Updated 7 months ago
RLsys-Foundation / TritonForge
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆99Updated last week