ASISys/Adrenaline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ASISys/Adrenaline)

ASISys / Adrenaline

Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation

☆42

Alternatives and similar repositories for Adrenaline

Users that are interested in Adrenaline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dmemsys / CHIME
View on GitHub
This is the implementation repository of our SOSP'24 paper: CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated M…
☆27Nov 7, 2024Updated last year
MachineLearningSystem / 25ASPLOS-Medusa
View on GitHub
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆12Nov 8, 2024Updated last year
alibaba / llm-scheduling-artifact
View on GitHub
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Jun 5, 2024Updated 2 years ago
PDZZXL / Awesome-LLM-Serving
View on GitHub
Large Language Model (LLM) Serving Paper and Resource List
☆29Jul 16, 2026Updated last week
flashserve / PAT
View on GitHub
Prefix-Aware Attention for LLM Decoding
☆41May 26, 2026Updated 2 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
infinigence / Semi-PD
View on GitHub
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆127Dec 25, 2025Updated 7 months ago
aoli-al / HFuse
View on GitHub
Horizontal Fusion
☆24Jan 7, 2022Updated 4 years ago
google / rago
View on GitHub
☆31Jun 22, 2025Updated last year
smcdef / memory-reordering
View on GitHub
A sample kernel module showing the memory reordering.
☆13May 30, 2020Updated 6 years ago
rayleizhu / vllm-ra
View on GitHub
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆39Feb 29, 2024Updated 2 years ago
dmemsys / Ditto
View on GitHub
This is the implementation repository of our SOSP'23 paper: Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System.
☆38Sep 24, 2023Updated 2 years ago
mi150 / VaLoRA
View on GitHub
☆11May 19, 2025Updated last year
jiachengh / Fleet
View on GitHub
☆13Mar 18, 2024Updated 2 years ago
appl-lab / CuTS
View on GitHub
☆13Sep 8, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆511Jan 8, 2026Updated 6 months ago
Thesys-lab / fast23-GLCache
View on GitHub
Repository for FAST'23 paper GL-Cache: Group-level Learning for Efficient and High-Performance Caching
☆51May 12, 2023Updated 3 years ago
LoongServe / LoongServe
View on GitHub
☆135Nov 11, 2024Updated last year
wanghongfei / hello
View on GitHub
Anonymous Chatting Website implemented by WebSocket(匿名在线聊天交友网站)
☆16Sep 17, 2015Updated 10 years ago
mitosis-project / mitosis-asplos20-artifact
View on GitHub
☆16Nov 26, 2020Updated 5 years ago
kygx-legend / vsgm
View on GitHub
☆11Nov 14, 2023Updated 2 years ago
flashserve / RAGPulse
View on GitHub
An Open-Source RAG Workload Trace to Optimize RAG Serving Systems
☆37Nov 18, 2025Updated 8 months ago
TJU-NSL / awesome-papers
View on GitHub
☆37Updated this week
TankLabTJU / INFless
View on GitHub
The source code of INFless，a native serverless platform for AI inference.
☆46Oct 10, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
LighT-chenml / GPHash
View on GitHub
☆14Dec 20, 2024Updated last year
AI-Infra-Team / awesome-papers
View on GitHub
Paper reading and discussion notes, covering AI frameworks, distributed systems, cluster management, etc.
☆69Mar 4, 2026Updated 4 months ago
minghust / ford
View on GitHub
[FAST 2022] FORD: Fast One-sided RDMA-based Distributed Transactions for Disaggregated Persistent Memory
☆62Jun 22, 2024Updated 2 years ago
PanZaifeng / FastTree-Artifact
View on GitHub
☆32Mar 24, 2025Updated last year
thustorage / Medusa
View on GitHub
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆47May 13, 2025Updated last year
alpa-projects / mms
View on GitHub
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆94Jul 14, 2023Updated 3 years ago
sakura-ysy / rdma-rpc
View on GitHub
C++ RPC based on RDMA
☆13Sep 12, 2023Updated 2 years ago
ClickHouse / NuRaft
View on GitHub
C++ implementation of Raft core logic as a replication library
☆10Jul 8, 2026Updated 3 weeks ago
mutinifni / splitwise-sim
View on GitHub
LLM serving cluster simulator
☆157Apr 25, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
dmemsys / Aceso
View on GitHub
This is the implementation repository of our SOSP'24 paper: Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value …
☆24Oct 20, 2024Updated last year
StyxXuan / LoraRetriever
View on GitHub
☆17Apr 29, 2025Updated last year
caihuaiguang / ORRIC
View on GitHub
INFOCOM 2024: Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference
☆34Oct 13, 2024Updated last year
akhtarnabeel / COSE-Serverless-Configuration
View on GitHub
COSE: Configuring Serverless Functions using Statistical Learning
☆10Jun 28, 2023Updated 3 years ago
Quangmire / voyager
View on GitHub
☆24Apr 10, 2022Updated 4 years ago
interestingLSY / swiftLLM
View on GitHub
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆330Jun 10, 2025Updated last year