stepfun-ai / StepMeshLinks

☆324

Alternatives and similar repositories for StepMesh

Users that are interested in StepMesh are comparing it to the libraries listed below

Sorting:

infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆188Updated last month
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆142Updated 10 months ago
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆144Updated 2 months ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆120Updated 2 months ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆444Updated 6 months ago
fzyzcjy / torch_memory_saver
Allow torch tensor memory to be released and resumed later
☆177Updated this week
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆450Updated last month
Victarry / PP-Schedule-Visualization
Pipeline Parallelism Emulation and Visualization
☆72Updated 5 months ago
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆169Updated 8 months ago
kwai / Megatron-Kwai
LLM training technologies developed by kwai
☆66Updated last week
LoongServe / LoongServe
☆124Updated last year
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆512Updated 3 months ago
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆116Updated 6 months ago
WukLab / preble
Stateful LLM Serving
☆89Updated 8 months ago
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆531Updated 3 weeks ago
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆437Updated 6 months ago
madsys-dev / deepseekv2-profile
☆152Updated 8 months ago
hao-ai-lab / MuxServe
☆79Updated last month
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆80Updated last year
yifuwang / symm-mem-recipes
☆147Updated 11 months ago
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆254Updated this week
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆437Updated last week
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆53Updated 3 months ago
AlibabaPAI / FLASHNN
☆102Updated last year
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆282Updated 8 months ago
shenh10 / DeepSeek_Simulator
☆90Updated 8 months ago
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆199Updated last year
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆498Updated this week
EfficientMoE / MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆262Updated last month
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆292Updated 5 months ago