infinigence / Semi-PDLinks

A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.

☆116

Alternatives and similar repositories for Semi-PD

Users that are interested in Semi-PD are comparing it to the libraries listed below

Sorting:

infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆188Updated last month
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆82Updated last week
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆144Updated 2 months ago
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆169Updated 8 months ago
shenh10 / DeepSeek_Simulator
☆90Updated 8 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆142Updated 10 months ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆63Updated last year
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆119Updated last year
WukLab / preble
Stateful LLM Serving
☆89Updated 8 months ago
sunkx109 / GPUs-Specs
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
☆67Updated 3 months ago
AlibabaPAI / FLASHNN
☆102Updated last year
tile-ai / tilescale
Tile-based language built for AI computation across all scales
☆82Updated this week
hao-ai-lab / MuxServe
☆79Updated last month
tlc-pack / cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Updated 2 months ago
xlite-dev / HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆134Updated 6 months ago
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆65Updated last year
antgroup / DeepXTrace
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆70Updated last week
kwai / Megatron-Kwai
LLM training technologies developed by kwai
☆66Updated last week
CalebDu / Awesome-Cute
☆112Updated 6 months ago
stepfun-ai / StepMesh
☆324Updated 3 weeks ago
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆80Updated last year
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆276Updated 3 months ago
Infrawaves / DeepEP_ibrc_dual-ports_multiQP
Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport
☆68Updated 6 months ago
OpenPPL / ppl.llm.kernel.cuda
☆152Updated 10 months ago
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆53Updated 3 months ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆120Updated 2 months ago
flagos-ai / FlagCX
☆131Updated this week
Ascend / triton-ascend
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
☆86Updated this week
LoongServe / LoongServe
☆124Updated last year
apache / tvm-ffi
Open ABI and FFI for Machine Learning Systems
☆211Updated this week