Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆12Nov 8, 2024Updated last year
Alternatives and similar repositories for 25ASPLOS-Medusa
Users that are interested in 25ASPLOS-Medusa are comparing it to the libraries listed below
Sorting:
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆41May 13, 2025Updated 9 months ago
- Artifacts for our ASPLOS'23 paper ElasticFlow☆55May 10, 2024Updated last year
- Integrated Training Platform (ITP) traces used in ElasticFlow paper.☆31Dec 23, 2022Updated 3 years ago
- ☆21Nov 12, 2025Updated 3 months ago
- Secure and performant OCI-image builder for Kubernetes☆12Updated this week
- ☆17May 27, 2025Updated 9 months ago
- For Vast.ai hosts. Prometheus exporter reporting data from your Vast.ai account.☆13Jan 7, 2026Updated last month
- ☆20May 24, 2025Updated 9 months ago
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆39Mar 27, 2025Updated 11 months ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆314Jun 10, 2025Updated 8 months ago
- ☆11Mar 22, 2022Updated 3 years ago
- a simple API to use CUPTI☆11Aug 19, 2025Updated 6 months ago
- ☆44Jul 4, 2024Updated last year
- This is the final project of 2020 DBMS course in SYSU☆10Jun 23, 2020Updated 5 years ago
- Dataset and pre-trained model of EMNLP-IJCNLP 2019 paper "TalkDown: A Corpus for Condescension Detection in Context."☆10Jan 26, 2020Updated 6 years ago
- ☆12Mar 1, 2025Updated 11 months ago
- The implementation for maximum clique enumeration algorithm☆11Apr 14, 2016Updated 9 years ago
- ☆10Sep 14, 2023Updated 2 years ago
- Processing large file - go☆10Sep 9, 2021Updated 4 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- 华为集合通信性能测试☆15May 27, 2024Updated last year
- DiscreteTom's Blog Boilerplate.☆10Mar 6, 2023Updated 2 years ago
- ☆13Updated this week
- An MLIR-based compiler from C/C++ to AMD-Xilinx Versal AIE☆18Aug 5, 2022Updated 3 years ago
- ☆11Sep 14, 2020Updated 5 years ago
- ☆10Sep 15, 2023Updated 2 years ago
- a vue-demo:vue仿网易新闻m站☆10Jul 26, 2017Updated 8 years ago
- [ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading☆13Jun 28, 2025Updated 8 months ago
- Synthetic aperture focusing technique for optoacoustic mesoscopy and scanning acoustic microscopy.☆13Jul 24, 2024Updated last year
- Acceleration codes for the Ozaki-scheme on integer matrix multiplication units.☆21Dec 10, 2025Updated 2 months ago
- OpenAI compatible API for open source LLMs☆16Oct 30, 2023Updated 2 years ago
- ☆13Jul 10, 2024Updated last year
- A Proof-of-concept CPU profiler written in Go using eBPF☆12Mar 6, 2023Updated 2 years ago
- ☆13Sep 30, 2022Updated 3 years ago
- Securing Deep Spiking Neural Networks against Adversarial Attacks through Inherent Structural Parameters☆13Aug 15, 2022Updated 3 years ago
- Benchmark and resources for single super-resolution algorithms☆10Apr 14, 2017Updated 8 years ago
- Repository for AI model benchmarking on TT-Buda☆15Feb 9, 2026Updated 2 weeks ago
- FPGA 2025 SAT Accel: A modern SAT Solver on FPGA Repository☆14Mar 13, 2025Updated 11 months ago
- Machine Learning meets eBPF☆15Apr 24, 2023Updated 2 years ago