Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆12Nov 8, 2024Updated last year
Alternatives and similar repositories for 25ASPLOS-Medusa
Users that are interested in 25ASPLOS-Medusa are comparing it to the libraries listed below
Sorting:
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆42May 13, 2025Updated 10 months ago
- Integrated Training Platform (ITP) traces used in ElasticFlow paper.☆31Dec 23, 2022Updated 3 years ago
- Artifacts for our ASPLOS'23 paper ElasticFlow☆56May 10, 2024Updated last year
- ☆29Jun 22, 2025Updated 8 months ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- Dataset and pre-trained model of EMNLP-IJCNLP 2019 paper "TalkDown: A Corpus for Condescension Detection in Context."☆10Jan 26, 2020Updated 6 years ago
- ☆10Sep 15, 2023Updated 2 years ago
- ☆11Nov 14, 2023Updated 2 years ago
- A fault-tolerant RDMA-based disaggregated key-value store with 1-RTT UPDATEs and GETs thanks to the SWARM replication protocol☆14Sep 25, 2024Updated last year
- 本文译自 University of Edinburgh 的 Volker Seeker 的 Process Scheduling in Linux , 介绍了 Linux 3.1 的任务调度机制。☆11Aug 11, 2016Updated 9 years ago
- ☆19May 27, 2025Updated 9 months ago
- 数据库案例:1.使用时间和日期函数,增,查时间字段。2.利用ContentProvider,CursorLoader,SQLite实现数据库的观察者模式。3.RxJava,SQLBrite实现数据库的观察者模式。4.拷贝外部db文件到数据库中☆21May 11, 2017Updated 8 years ago
- Deduplication over dis-aggregated memory for Serverless Computing☆14Mar 21, 2022Updated 3 years ago
- ☆11Mar 22, 2022Updated 3 years ago
- Anonymous Chatting Website implemented by WebSocket(匿名在线聊天交友网站)☆16Sep 17, 2015Updated 10 years ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆320Jun 10, 2025Updated 9 months ago
- This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …☆17Apr 1, 2025Updated 11 months ago
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆78Oct 15, 2025Updated 5 months ago
- ☆21Aug 27, 2016Updated 9 years ago
- ☆13Sep 8, 2021Updated 4 years ago
- ☆15May 23, 2023Updated 2 years ago
- Linux kernel SGX driver for Graphene☆12Nov 3, 2020Updated 5 years ago
- This is the final project of 2020 DBMS course in SYSU☆10Jun 23, 2020Updated 5 years ago
- We present a set of all-reduce compatible gradient compression algorithms which significantly reduce the communication overhead while mai…☆10Nov 14, 2021Updated 4 years ago
- Source code for Jellyfish, a soft real-time inference serving system☆15Dec 20, 2022Updated 3 years ago
- 该六子棋程序使用Java语言编写,内置AI落子,主要由阿尔法贝塔搜索+评估函数实现,存在一定的bug,智能方面还行吧☆12Jul 24, 2021Updated 4 years ago
- Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores (EuroSys'25)☆15Jul 17, 2025Updated 8 months ago
- The /etc/resolv.conf file parser in rust☆33Dec 29, 2025Updated 2 months ago
- ☆14Dec 5, 2024Updated last year
- A DAG processor and compiler for a tree-based spatial datapath.☆16Aug 24, 2022Updated 3 years ago
- ☆11Jun 5, 2024Updated last year
- Codebase for the Progressive Mixed-Precision Decoding paper.☆19Jul 15, 2025Updated 8 months ago
- ☆11Dec 18, 2020Updated 5 years ago
- DRAM/SSD hybrid caching system☆15Mar 13, 2025Updated last year
- ☆15Dec 13, 2024Updated last year
- ☆12Jun 29, 2024Updated last year
- ☆15Dec 2, 2022Updated 3 years ago
- An implementation of Raft Consensus Algorithm in Elixir☆21Apr 3, 2016Updated 9 years ago
- NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆90Jun 16, 2025Updated 9 months ago