PDZZXL / Awesome-LLM-ServingLinks
Large Language Model (LLM) Serving Paper and Resource List
☆24Updated last month
Alternatives and similar repositories for Awesome-LLM-Serving
Users that are interested in Awesome-LLM-Serving are comparing it to the libraries listed below
Sorting:
- ☆113Updated last week
- ☆165Updated last year
- ☆23Updated last year
- NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing☆86Updated last year
- LLM serving cluster simulator☆107Updated last year
- ☆11Updated 9 months ago
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆142Updated last year
- The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.☆9Updated 9 months ago
- This repository is established to store personal notes and annotated papers during daily research.☆131Updated last week
- LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale☆123Updated 3 weeks ago
- ☆42Updated 3 weeks ago
- ☆77Updated last year
- ☆79Updated 3 months ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆52Updated last year
- ☆10Updated 6 months ago
- ☆143Updated last year
- PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training☆17Updated last year
- Curated collection of papers in machine learning systems☆381Updated last month
- LLM Inference analyzer for different hardware platforms☆79Updated this week
- ☆103Updated last year
- ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale☆393Updated 3 weeks ago
- ☆18Updated last year
- UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.☆30Updated this week
- An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation☆48Updated last year
- Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025☆76Updated 2 months ago
- Github repository of HPCA 2025 paper "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"☆13Updated 7 months ago
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆53Updated 7 months ago
- ☆37Updated last year
- TileFlow is a performance analysis tool based on Timeloop for fusion dataflows☆62Updated last year
- ☆48Updated last year