bytedance-iaas / splitwise-demosLinks

☆26

Alternatives and similar repositories for splitwise-demos

Users that are interested in splitwise-demos are comparing it to the libraries listed below

Sorting:

infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆87Updated last month
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆269Updated 3 weeks ago
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆42Updated 4 months ago
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆49Updated 4 months ago
triton-inference-server / triton_distributed
☆50Updated 3 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆143Updated last week
zw0610 / zw0610.github.io
☆58Updated 4 years ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆380Updated 3 weeks ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆61Updated last year
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆59Updated last year
ai-dynamo / nixl
NVIDIA Inference Xfer Library (NIXL)
☆422Updated this week
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆437Updated last week
InternLM / turbomind
☆86Updated 3 months ago
WukLab / preble
Stateful LLM Serving
☆73Updated 3 months ago
shenh10 / DeepSeek_Simulator
☆73Updated 2 months ago
Azure / msccl-executor-nccl
☆37Updated 6 months ago
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆149Updated 3 months ago
simon-mo / vLLM-Benchmark
☆28Updated 2 months ago
Azure / msccl
Microsoft Collective Communication Library
☆64Updated 7 months ago
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆100Updated last year
microsoft / NPKit
NCCL Profiling Kit
☆138Updated 11 months ago
xlite-dev / HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆80Updated last month
FlagOpen / FlagCX
☆69Updated last week
BBuf / tensorrt-llm-moe
☆29Updated 4 months ago
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆246Updated 2 weeks ago
yifuwang / symm-mem-recipes
☆90Updated 5 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆125Updated 5 months ago
google / nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆117Updated last year
AlibabaPAI / FLASHNN
☆96Updated 9 months ago
flexflow / flexflow-serve
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
☆42Updated last month