thustorage / Medusa
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆19Updated 3 months ago
Alternatives and similar repositories for Medusa:
Users that are interested in Medusa are comparing it to the libraries listed below
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆27Updated 3 months ago
- PetPS: Supporting Huge Embedding Models with Tiered Memory☆30Updated 10 months ago
- A Progam-Behavior-Guided Far Memory System☆34Updated last year
- Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory☆38Updated last year
- [OSDI 2024] Motor: Enabling Multi-Versioning for Distributed Transactions on Disaggregated Memory☆47Updated last year
- ☆53Updated 4 years ago
- ☆14Updated 8 months ago
- The Artifact Evaluation Version of SOSP Paper #19☆45Updated 7 months ago
- This is the implementation repository of our SOSP'24 paper: Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value …☆19Updated 5 months ago
- ☆16Updated 10 months ago
- TeRM: Extending RDMA-Attached Memory with SSD [FAST'24]☆40Updated 5 months ago
- ☆32Updated 9 months ago
- Scaling Up Memory Disaggregated Applications with SMART☆27Updated 10 months ago
- ☆23Updated last year
- Deduplication over dis-aggregated memory for Serverless Computing☆12Updated 3 years ago
- The source code of INFless,a native serverless platform for AI inference.☆36Updated 2 years ago
- ☆16Updated 2 years ago
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆92Updated 2 years ago
- ☆69Updated 3 years ago
- A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.☆42Updated 2 years ago
- [FAST 2022] FORD: Fast One-sided RDMA-based Distributed Transactions for Disaggregated Persistent Memory☆60Updated 9 months ago
- Artifacts for our ASPLOS'23 paper ElasticFlow☆52Updated 10 months ago
- Nu is a new datacenter system that enables developers to build fungible applications that can use datacenter resources wherever they are.☆38Updated 10 months ago
- Website for Artifact Evaluation at EuroSys, SOSP, OSDI, ATC☆35Updated this week
- Code for "Baleen: ML Admission & Prefetching for Flash Caches" (FAST 2024).☆23Updated last year
- Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.☆49Updated 2 years ago
- Hermit: Low-Latency, High-Throughput, and Transparent Remote Memory via Feedback-Directed Asynchrony☆34Updated 9 months ago
- This is the implementation repository of our FAST'23 paper: FUSEE: A Fully Memory-Disaggregated Key-Value Store.☆55Updated 2 years ago
- Exploring the Design Space of Page Management for Multi-Tiered Memory Systems (USENIX ATC '21)☆45Updated 2 years ago