AISys-01 / vllm-CachedAttentionLinks

The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.

☆11

Alternatives and similar repositories for vllm-CachedAttention

Users that are interested in vllm-CachedAttention are comparing it to the libraries listed below

Sorting:

platformxlab / G10
☆40Updated 2 years ago
PKUZHOU / NeoMem-MICRO-2024
The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
☆59Updated last year
ferry-hhh / CXL-DMSim
CXL-DMSim: A Full-System CXL Disaggregated Memory Simulator With Comprehensive Silicon Validation
☆104Updated last month
MeshInfra / WaferLLM
WaferLLM: Large Language Model Inference at Wafer Scale
☆73Updated 3 weeks ago
Yufeng98 / CENT
Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025
☆102Updated 6 months ago
PDZZXL / Awesome-LLM-Serving
Large Language Model (LLM) Serving Paper and Resource List
☆24Updated 6 months ago
zxhero / gem5-CXL
This is an read-only mirror of the gem5 simulator. The upstream repository is stored in https://gem5.googlesource.com, code reviews shoul…
☆36Updated last year
OSU-STARLAB / UVM_benchmark
☆32Updated 5 years ago
sitar-lab / NeuSight
☆54Updated 4 months ago
casys-kaist / HUVM
☆24Updated 3 years ago
thu-nics / UniNDP
Github repository of HPCA 2025 paper "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"
☆15Updated 2 months ago
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆103Updated 2 years ago
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆55Updated last year
tallendev / uvm-eval
This serves as a repository for reproducibility of the SC21 paper "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated…
☆36Updated 2 years ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆120Updated last year
SNU-ARC / MERCI
☆18Updated 4 years ago
SJTU-ReArch-Group / Paper-Reading-List
☆136Updated 3 weeks ago
Compute-Express-Link / CXLPapers
☆110Updated 2 years ago
hongzhangblaze / CS854-F24
☆51Updated 2 months ago
AIS-SNU / Smart-Infinity
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
☆49Updated 4 months ago
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆161Updated last year
lingfenghsiang / Nomad
OSDI'24 Nomad implementation
☆54Updated 3 months ago
PrincetonUniversity / LLMCompass
☆205Updated 3 weeks ago
ucare-uchicago / ev-store-dlrm
☆31Updated last year
leesou / PIM-DL-ASPLOS
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
☆33Updated last year
spypaul / MQSim_CXL
☆75Updated 2 years ago
casys-kaist / LLMServingSim
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
☆157Updated 4 months ago
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆97Updated 4 months ago
csl-iisc / GPM-ASPLOS22
☆36Updated last year
Sys-KU / AutoTiering
[USENIX ATC 2021] Exploring the Design Space of Page Management for Multi-Tiered Memory Systems
☆48Updated 3 years ago