thustorage / MedusaLinks

Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]

☆40

Alternatives and similar repositories for Medusa

Users that are interested in Medusa are comparing it to the libraries listed below

Sorting:

alibaba-edu / qwen-bailian-usagetraces-anon
☆61Updated last month
Thesys-lab / Helix-ASPLOS25
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆73Updated last month
msr-fiddle / CheckFreq
☆57Updated 4 years ago
uclasystem / bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆54Updated 2 years ago
LLMServe / dLoRA-artifact
☆27Updated last year
rkhan055 / SHADE
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training
☆35Updated 2 years ago
pkusys / ElasticFlow
Artifacts for our ASPLOS'23 paper ElasticFlow
☆55Updated last year
NEO-MLSys25 / NEO
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
☆69Updated 5 months ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Updated last year
SJTU-IPADS / ugache
☆23Updated 2 years ago
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆103Updated 2 years ago
csl-iisc / GPM-ASPLOS22
☆36Updated last year
thustorage / PetPS
PetPS: Supporting Huge Embedding Models with Tiered Memory
☆33Updated last year
SJTU-IPADS / reef-artifacts
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆43Updated 3 years ago
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated 2 years ago
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆37Updated 3 years ago
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆91Updated 2 years ago
Hsword / Awesome-Machine-Learning-System-Papers
☆79Updated 3 years ago
jasperzhong / swift
☆15Updated 3 years ago
gudiandian / ElasticFlow
☆16Updated last year
alibaba / ServeGen
A framework for generating realistic LLM serving workloads
☆87Updated last month
dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆25Updated last year
hongzhangblaze / CS854-F24
☆52Updated 2 months ago
WukLab / Mira
A Progam-Behavior-Guided Far Memory System
☆35Updated 2 years ago
minghust / motor
[OSDI 2024] Motor: Enabling Multi-Versioning for Distributed Transactions on Disaggregated Memory
☆50Updated last year
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆132Updated last year
dmemsys / Aceso
This is the implementation repository of our SOSP'24 paper: Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value …
☆22Updated last year
nicexlab / GeminiFS
GeminiFS: A Companion File System for GPUs
☆66Updated 9 months ago
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆220Updated 4 months ago
dmemsys / SMART
This is the implementation repository of our OSDI'23 paper: SMART: A High-Performance Adaptive Radix Tree for Disaggregated Memory.
☆64Updated last year