☆20Jun 9, 2025Updated 9 months ago
Alternatives and similar repositories for Apt-Serve
Users that are interested in Apt-Serve are comparing it to the libraries listed below
Sorting:
- The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.☆11Sep 19, 2024Updated last year
- A fast text search engine built for SSDs, written in C++.☆11Aug 29, 2022Updated 3 years ago
- A record of reading list on some MLsys popular topic☆22Mar 20, 2025Updated 11 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆72Nov 4, 2024Updated last year
- Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …☆47Jun 1, 2024Updated last year
- Self-host LLMs with LMDeploy and BentoML☆22Dec 26, 2025Updated 2 months ago
- (ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.019…☆38Jun 21, 2025Updated 8 months ago
- ☆167Jul 15, 2025Updated 7 months ago
- A benchmark suite for evaluating FaaS scheduler.☆23Nov 5, 2022Updated 3 years ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆24Nov 21, 2024Updated last year
- ☆26Mar 31, 2022Updated 3 years ago
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated last month
- Tutorial for Ray☆36Mar 31, 2024Updated last year
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆42May 13, 2025Updated 9 months ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- A simple MIPS CPU for BUAA CO course (and now NSCSCC).☆10May 15, 2021Updated 4 years ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- A distributed stream querying engine that provides sub-millisecond stateful query at millions of queries per-second over fast-evolving li…☆10Jul 18, 2018Updated 7 years ago
- LITS: An Optimized Learned Index for Strings☆13Jun 18, 2025Updated 8 months ago
- d3LLM: Ultra-Fast Diffusion LLM 🚀☆98Feb 4, 2026Updated last month
- Source code for Spitfire: A Three-Tier Buffer Manager for Volatile and Non-Volatile Memory☆39Jan 7, 2023Updated 3 years ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆92Updated this week
- A 100% locally run AI web tool for generating WeChat replies using the RWKV runner☆10Oct 29, 2024Updated last year
- 提供了一个极简的发电文案接口和一些云崽插件☆11Jan 17, 2025Updated last year
- paper and code for New Directions in Cloud Programming, CIDR 2021☆11Feb 17, 2021Updated 5 years ago
- Brand new TTS solution☆11Dec 7, 2024Updated last year
- My notes for reading leveldb☆11Apr 19, 2024Updated last year
- A sd-webui extension for utilizing DanTagGen to "upsample prompts".☆13Jun 13, 2024Updated last year
- a simple API to use CUPTI☆11Aug 19, 2025Updated 6 months ago
- API server for F5-TTS☆20Jan 24, 2026Updated last month
- A library for simplifying training with multi gpu setups in the HuggingFace / PyTorch ecosystem.☆16Jan 9, 2026Updated 2 months ago
- ComfyUI-VRAM-Manager is an independent memory management custom node for ComfyUI. Provides Distorch memory management functionality for e…☆22Jan 23, 2026Updated last month
- ☆11Sep 12, 2023Updated 2 years ago
- ACM Class 2017 Computer Architecture☆10Jan 11, 2018Updated 8 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- Serverless LLM Serving for Everyone.☆662Updated this week
- Repo for our AKBC-2021 paper: Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering☆10Oct 10, 2021Updated 4 years ago
- linux 内核技术文档☆16Feb 26, 2026Updated last week
- A powerful extension for ComfyUI that enables adding notes to any node in your workflow.☆13Apr 20, 2025Updated 10 months ago