PDZZXL / Awesome-LLM-ServingLinks
Large Language Model (LLM) Serving Paper and Resource List
☆24Updated 6 months ago
Alternatives and similar repositories for Awesome-LLM-Serving
Users that are interested in Awesome-LLM-Serving are comparing it to the libraries listed below
Sorting:
- ☆136Updated 3 weeks ago
- ☆205Updated 3 weeks ago
- This repository is established to store personal notes and annotated papers during daily research.☆161Updated last week
- ☆54Updated 4 months ago
- LLM Inference analyzer for different hardware platforms☆97Updated 4 months ago
- LLM serving cluster simulator☆120Updated last year
- ☆12Updated last year
- ☆23Updated last year
- WaferLLM: Large Language Model Inference at Wafer Scale☆73Updated 3 weeks ago
- The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.☆11Updated last year
- ☆156Updated last year
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆67Updated 6 months ago
- Paper reading and discussion notes, covering AI frameworks, distributed systems, cluster management, etc.☆37Updated last week
- LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale☆157Updated 4 months ago
- Summary of some awesome work for optimizing LLM inference☆138Updated 2 weeks ago
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆161Updated last year
- DeepSeek-V3/R1 inference performance simulator☆168Updated 7 months ago
- OSDI 2023 Welder, deeplearning compiler☆27Updated last year
- ☆90Updated 7 months ago
- NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing☆100Updated last year
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆71Updated last month
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆55Updated last year
- ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale☆466Updated last week
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.☆40Updated last year
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆103Updated 2 years ago
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆116Updated 3 years ago
- ☆40Updated 2 years ago
- Github repository of HPCA 2025 paper "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"☆15Updated 2 months ago
- ☆79Updated 3 years ago
- ☆45Updated last year