zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆106Updated 2 weeks ago
Alternatives and similar repositories for shallowsim:
Users that are interested in shallowsim are comparing it to the libraries listed below
- High performance Transformer implementation in C++.☆115Updated 2 months ago
- ☆39Updated last week
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆61Updated 10 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆79Updated 4 months ago
- Microsoft Collective Communication Library☆65Updated 4 months ago
- nnScaler: Compiling DNN models for Parallel Training☆106Updated 2 months ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆34Updated this week
- ☆94Updated 5 months ago
- ☆78Updated 2 years ago
- NCCL Profiling Kit☆129Updated 9 months ago
- ☆36Updated 4 months ago
- ☆56Updated 10 months ago
- Thunder Research Group's Collective Communication Library☆34Updated 11 months ago
- Stateful LLM Serving☆58Updated last month
- Ultra | Ultimate | Unified CCL☆58Updated 2 months ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆58Updated 11 months ago
- A resilient distributed training framework☆94Updated last year
- Synthesizer for optimal collective communication algorithms☆105Updated last year
- LLM serving cluster simulator☆96Updated 11 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆93Updated last year
- DeeperGEMM: crazy optimized version☆65Updated last week
- ☆23Updated 9 months ago
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆81Updated last year
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆153Updated 6 months ago
- PyTorch distributed training acceleration framework☆47Updated 2 months ago
- ☆47Updated this week
- ☆39Updated 10 months ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆156Updated 9 months ago
- Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM☆36Updated last month
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆113Updated last year