sihyeong / Awesome-LLM-Inference-EngineLinks

☆138

Alternatives and similar repositories for Awesome-LLM-Inference-Engine

Users that are interested in Awesome-LLM-Inference-Engine are comparing it to the libraries listed below

Sorting:

Zefan-Cai / Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
☆376Updated 7 months ago
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆278Updated 4 months ago
InternLM / Awesome-LLM-Training-System
☆43Updated last year
pprp / Awesome-LLM-Quantization
Awesome list for LLM quantization
☆326Updated last week
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆439Updated this week
hahnyuan / LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…
☆567Updated last year
TreeAI-Lab / Awesome-KV-Cache-Management
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…
☆221Updated 2 months ago
PKU-SEC-Lab / HybriMoE
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆75Updated 4 months ago
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆199Updated 2 weeks ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆120Updated 6 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆278Updated 7 months ago
infinigence / SpecEE
Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
☆64Updated 5 months ago
MoE-Inf / awesome-moe-inference
Curated collection of papers in MoE model inference
☆285Updated last month
October2001 / Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆566Updated 3 weeks ago
madsys-dev / deepseekv2-profile
☆148Updated 7 months ago
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆115Updated last year
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆120Updated 4 months ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆431Updated last week
InternLM / turbomind
☆96Updated 6 months ago
shishishu / LLM-Inference-Acceleration
LLM Inference with Deep Learning Accelerator.
☆52Updated 9 months ago
YaoJiayi / CacheBlend
☆141Updated 3 months ago
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆615Updated 3 weeks ago
UChi-JCL / CacheGen
☆136Updated last year
LDLINGLINGLING / nano_vllm_note
注释的nano_vllm仓库，并且完成了MiniCPM4的适配以及注册新模型的功能
☆81Updated 2 months ago
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆60Updated 11 months ago
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆108Updated 3 months ago
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆338Updated 3 months ago
hao-ai-lab / MuxServe
☆74Updated last week
ovg-project / kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆104Updated this week
WukLab / preble
Stateful LLM Serving
☆87Updated 7 months ago