chenhongyu2048 / LLM-inference-optimization-paperLinks

Summary of some awesome work for optimizing LLM inference

☆120

Alternatives and similar repositories for LLM-inference-optimization-paper

Users that are interested in LLM-inference-optimization-paper are comparing it to the libraries listed below

Sorting:

galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆278Updated 7 months ago
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆155Updated 3 weeks ago
MoE-Inf / awesome-moe-inference
Curated collection of papers in MoE model inference
☆290Updated last week
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆155Updated last year
lambda7xx / awesome-AI-system
paper and its code for AI System
☆331Updated 2 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆138Updated 9 months ago
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆282Updated 4 months ago
TreeAI-Lab / Awesome-KV-Cache-Management
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…
☆221Updated 2 months ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆116Updated last year
sunkx109 / GPUs-Specs
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
☆63Updated 2 months ago
Hsword / Awesome-Machine-Learning-System-Papers
☆77Updated 3 years ago
infinigence / SpecEE
Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
☆64Updated 6 months ago
LoongServe / LoongServe
☆124Updated 11 months ago
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆215Updated 3 months ago
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆126Updated last week
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆108Updated 3 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆181Updated 2 weeks ago
YaoJiayi / CacheBlend
☆141Updated 3 months ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆117Updated last month
Zefan-Cai / Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
☆379Updated 7 months ago
goliaro / specinfer-ae
☆24Updated last year
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆147Updated 3 years ago
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆170Updated 7 months ago
hahnyuan / LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…
☆567Updated last year
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆113Updated 5 months ago
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆94Updated 3 months ago
byungsoo-oh / ml-systems-papers
Curated collection of papers in machine learning systems
☆433Updated 3 weeks ago
shenh10 / DeepSeek_Simulator
☆90Updated 6 months ago
Relaxed-System-Lab / HexGen
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆30Updated last year
SJTU-ReArch-Group / Paper-Reading-List
☆130Updated this week