zartbot / shallowsimLinks

DeepSeek-V3/R1 inference performance simulator

☆156

Alternatives and similar repositories for shallowsim

Users that are interested in shallowsim are comparing it to the libraries listed below

Sorting:

shenh10 / DeepSeek_Simulator
☆81Updated 4 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆154Updated last month
uccl-project / uccl
Ultra and Unified CCL
☆440Updated this week
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆128Updated 6 months ago
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆103Updated 2 months ago
sunkx109 / GPUs-Specs
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
☆53Updated 4 months ago
WukLab / preble
Stateful LLM Serving
☆77Updated 4 months ago
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆115Updated this week
mutinifni / splitwise-sim
LLM serving cluster simulator
☆108Updated last year
LoongServe / LoongServe
☆109Updated 8 months ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆114Updated 3 weeks ago
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆142Updated 6 months ago
microsoft / NPKit
NCCL Profiling Kit
☆139Updated last year
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆62Updated last year
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆263Updated 4 months ago
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆144Updated 3 years ago
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆390Updated this week
calculon-ai / calculon
☆145Updated last year
Azure / msccl
Microsoft Collective Communication Library
☆63Updated 8 months ago
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆88Updated last month
parasailteam / coconet
☆80Updated 2 years ago
stepfun-ai / StepMesh
☆24Updated last week
JF-D / Proteus
☆23Updated last year
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆60Updated last year
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆80Updated 3 weeks ago
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆189Updated last week
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆136Updated last week
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆405Updated 2 months ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆397Updated 2 months ago
Hsword / Awesome-Machine-Learning-System-Papers
☆74Updated 3 years ago