casys-kaist / LLMServingSimLinks

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale

☆160

Alternatives and similar repositories for LLMServingSim

Users that are interested in LLMServingSim are comparing it to the libraries listed below

Sorting:

mutinifni / splitwise-sim
LLM serving cluster simulator
☆122Updated last year
PrincetonUniversity / LLMCompass
☆209Updated last month
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆96Updated 4 months ago
VIA-Research / vTrain
☆73Updated 6 months ago
casys-kaist / NeuPIMs
NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing
☆102Updated last year
sitar-lab / NeuSight
☆57Updated 5 months ago
leesou / PIM-DL-ASPLOS
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
☆33Updated last year
Yufeng98 / CENT
Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025
☆111Updated 7 months ago
AIS-SNU / PID-Comm
☆27Updated last year
upmem / upmem_llm_framework
UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.
☆37Updated 3 months ago
VIA-Research / uPIMulator
☆155Updated 10 months ago
ranggihwang / Pregated_MoE
☆57Updated last year
AIS-SNU / Smart-Infinity
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
☆48Updated 4 months ago
SJTU-ReArch-Group / Paper-Reading-List
☆139Updated 2 weeks ago
scale-snu / attacc_simulator
☆113Updated last year
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆55Updated last year
MeshInfra / WaferLLM
WaferLLM: Large Language Model Inference at Wafer Scale
☆76Updated last month
platformxlab / G10
☆40Updated 2 years ago
UMass-LIDS / Proteus
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
☆12Updated last year
PSAL-POSTECH / ONNXim
ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference
☆171Updated this week
astra-sim / tacos
TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning
☆29Updated 5 months ago
Thesys-lab / Helix-ASPLOS25
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆74Updated last month
calculon-ai / calculon
☆160Updated last year
PSAL-POSTECH / M2NDP-public
A Cycle-level simulator for M2NDP
☆32Updated 3 months ago
Sys-KU / DeepPlan
[ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆56Updated 4 months ago
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆137Updated last month
pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆47Updated 11 months ago
PKUZHOU / NeoMem-MICRO-2024
The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
☆59Updated last year
ConvolutedDog / HyFiSS
HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs
☆38Updated 11 months ago
mlcommons / chakra
Repository for MLCommons Chakra schema and tools
☆142Updated last month