HPMLL / SpInfer_EuroSys25Links

☆27

Alternatives and similar repositories for SpInfer_EuroSys25

Users that are interested in SpInfer_EuroSys25 are comparing it to the libraries listed below

Sorting:

snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆155Updated last year
mutinifni / splitwise-sim
LLM serving cluster simulator
☆116Updated last year
LoongServe / LoongServe
☆124Updated 11 months ago
PanZaifeng / FastTree-Artifact
☆25Updated 7 months ago
DD-DuDa / BitLadder
A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆60Updated this week
HPMLL / DTC-SpMM_ASPLOS24
☆39Updated last year
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆181Updated 2 weeks ago
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆120Updated 4 months ago
EfficientLLMSys / MuxServe
☆13Updated last year
hao-ai-lab / MuxServe
☆74Updated last week
ranggihwang / Pregated_MoE
☆55Updated last year
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆138Updated 9 months ago
SJTU-ReArch-Group / Paper-Reading-List
☆130Updated this week
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆213Updated 3 months ago
microsoft / SparTA
☆153Updated last year
xinhao-luo / ClusterFusion
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆45Updated last month
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆155Updated 3 weeks ago
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆278Updated 7 months ago
uchuhimo / amanda
☆18Updated last year
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆59Updated 7 months ago
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆94Updated 3 months ago
goliaro / specinfer-ae
☆24Updated last year
infinigence / SpecEE
Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
☆64Updated 6 months ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆118Updated last month
pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆43Updated 10 months ago
AlibabaResearch / mononn
☆31Updated last year
UDC-GAC / venom
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
☆53Updated last year
monellz / FlashTensor
☆16Updated 7 months ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆67Updated 7 months ago