Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation
☆40Nov 10, 2025Updated 3 months ago
Alternatives and similar repositories for Adrenaline
Users that are interested in Adrenaline are comparing it to the libraries listed below
Sorting:
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆64Jun 5, 2024Updated last year
- AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference☆20Jan 24, 2025Updated last year
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- A MIPS CPU with dual-issue, out-of-order, and 5-stage pipelines☆11Nov 28, 2019Updated 6 years ago
- Linux tree for ntrdma driver development.☆11Jun 29, 2017Updated 8 years ago
- Stateful LLM Serving☆97Mar 11, 2025Updated 11 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Feb 29, 2024Updated 2 years ago
- C++ RPC based on RDMA☆13Sep 12, 2023Updated 2 years ago
- SJTU CS2951 Computer Architecture Course Project, A Verilog HDL implemented RISC-V CPU.☆10Jan 15, 2022Updated 4 years ago
- COSE: Configuring Serverless Functions using Statistical Learning☆10Jun 28, 2023Updated 2 years ago
- ☆13May 25, 2022Updated 3 years ago
- Fast and memory-efficient exact attention☆16Updated this week
- The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.☆11Sep 19, 2024Updated last year
- ☆10Jul 5, 2023Updated 2 years ago
- ☆11Nov 7, 2019Updated 6 years ago
- Computer Graphics Course (COMP130018.01, 2023 Spring) Project of Fudan University.☆14Aug 4, 2023Updated 2 years ago
- LaTeX Template for Fudan University School of Computer Science 2024☆11May 21, 2024Updated last year
- A Rust-based Unikernel Enhancing Reliability and Efficiency of Embedded Systems.☆11Jun 28, 2024Updated last year
- Source code for the paper: "Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs"☆16Apr 15, 2024Updated last year
- A Tomasulo & Scoreboarding Visual Simulator☆10Nov 19, 2023Updated 2 years ago
- ☆12Mar 18, 2024Updated last year
- ☆13Jan 7, 2025Updated last year
- [ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation☆22May 29, 2025Updated 9 months ago
- ☆25Oct 11, 2025Updated 4 months ago
- Autoware reference system integrated with the PAAM framework☆11Apr 8, 2024Updated last year
- blogs about Coimpiler & Virtual Machine☆12Jun 15, 2025Updated 8 months ago
- ☆13May 11, 2023Updated 2 years ago
- Cluster simulator with far memory☆12Apr 28, 2020Updated 5 years ago
- A Dart library to decode & encode NDEF records, supporting multiple types.☆14Nov 6, 2025Updated 3 months ago
- A simulator for the Lightning Network☆13Jun 25, 2020Updated 5 years ago
- Tiered Memory Management Beyond Hotness (OSDI'25)☆32Jul 31, 2025Updated 7 months ago
- Flexible memory allocation tool for multi-tiered memory systems☆13Jan 7, 2026Updated last month
- ☆10Sep 14, 2023Updated 2 years ago
- A list of learning materials to understand databases internals☆10Sep 12, 2021Updated 4 years ago
- ☆87Jan 22, 2026Updated last month
- ☆131Nov 11, 2024Updated last year
- A repository of Dockerfiles, scripts, yaml files, Helm Charts, etc. used to build and scale the sample AI workflows with python, kubernet…☆12Feb 22, 2024Updated 2 years ago
- ☆13Sep 8, 2021Updated 4 years ago
- This repo is used to assess NSL's scientific research assistants.☆17Jul 7, 2025Updated 7 months ago