☆20Jun 9, 2025Updated 9 months ago
Alternatives and similar repositories for Apt-Serve
Users that are interested in Apt-Serve are comparing it to the libraries listed below
Sorting:
- The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.☆11Sep 19, 2024Updated last year
- A record of reading list on some MLsys popular topic☆22Mar 20, 2025Updated 11 months ago
- A fast text search engine built for SSDs, written in C++.☆11Aug 29, 2022Updated 3 years ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆72Nov 4, 2024Updated last year
- Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …☆47Jun 1, 2024Updated last year
- Self-host LLMs with LMDeploy and BentoML☆22Dec 26, 2025Updated 2 months ago
- (ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.019…☆38Jun 21, 2025Updated 8 months ago
- ☆167Jul 15, 2025Updated 7 months ago
- A benchmark suite for evaluating FaaS scheduler.☆23Nov 5, 2022Updated 3 years ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆24Nov 21, 2024Updated last year
- ☆26Mar 31, 2022Updated 3 years ago
- A user-friendly Command & Control (C&C) web platform for remote monitoring, management, and task automation across multiple devices.☆14Dec 15, 2024Updated last year
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated last month
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆42May 13, 2025Updated 9 months ago
- ☆85Apr 18, 2025Updated 10 months ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- ☆14Jun 10, 2025Updated 9 months ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- LITS: An Optimized Learned Index for Strings☆13Jun 18, 2025Updated 8 months ago
- A distributed stream querying engine that provides sub-millisecond stateful query at millions of queries per-second over fast-evolving li…☆10Jul 18, 2018Updated 7 years ago
- A simple MIPS CPU for BUAA CO course (and now NSCSCC).☆10May 15, 2021Updated 4 years ago
- This is the code of a agentic rag method with dynamic workflow.☆12Jan 22, 2026Updated last month
- Dynamic Context Selection for Efficient Long-Context LLMs☆56May 20, 2025Updated 9 months ago
- Source code for Spitfire: A Three-Tier Buffer Manager for Volatile and Non-Volatile Memory☆39Jan 7, 2023Updated 3 years ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆92Updated this week
- 🤖 It is basic tools build for scraping & embedding text. The main technologies included OpenAI embeddings, Supabase and Next.js.☆16Apr 12, 2023Updated 2 years ago
- API server for F5-TTS☆20Jan 24, 2026Updated last month
- paper and code for New Directions in Cloud Programming, CIDR 2021☆11Feb 17, 2021Updated 5 years ago
- Brand new TTS solution☆11Dec 7, 2024Updated last year
- Discover the simplicity and efficiency of Void Linux on your Android device with VoidMagic! 🚀☆10Dec 11, 2023Updated 2 years ago
- read source code of boltdb & re-implement it in c++☆12Jun 2, 2018Updated 7 years ago
- A sd-webui extension for utilizing DanTagGen to "upsample prompts".☆13Jun 13, 2024Updated last year
- boost context 自实现协程和调度器。构建rpc框架☆10May 9, 2025Updated 10 months ago
- A 100% locally run AI web tool for generating WeChat replies using the RWKV runner☆10Oct 29, 2024Updated last year
- UnitEval is a benchmarking and evaluation tools for AutoDev Coder.☆13Jan 2, 2024Updated 2 years ago
- 提供了一个极简的发电文案接口和一些云崽插件☆11Jan 17, 2025Updated last year
- VIVOTO is an android simple video and photo editor that can remove anything that you want to remove object. In this app, you can use trim…☆11Jun 16, 2020Updated 5 years ago
- Containerized self-hosted REST API for vision classification, utilizing Hugging Face transformers.☆10Dec 5, 2024Updated last year