eddiegaoo/Apt-Serve

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/eddiegaoo/Apt-Serve)

eddiegaoo / Apt-Serve

☆21

Alternatives and similar repositories for Apt-Serve

Users that are interested in Apt-Serve are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AISys-01 / vllm-CachedAttention
View on GitHub
The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.
☆11Sep 19, 2024Updated last year
hao-ai-lab / vllm-ltr
View on GitHub
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆81Nov 4, 2024Updated last year
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
James-QiuHaoran / LLM-serving-with-proxy-models
View on GitHub
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …
☆52Jun 1, 2024Updated 2 years ago
lingfenghsiang / Persistent-Memory-Study
View on GitHub
☆25Mar 31, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Hanchenli / vllm-continuum
View on GitHub
Preview Code for Continuum Paper
☆91Jul 20, 2026Updated last week
junhongmit / P-and-B
View on GitHub
🧠Plan-and-Budget: Training-free test-time reasoning framework for adaptive token allocation in large language models (ICLR 2026).
☆15Mar 2, 2026Updated 4 months ago
lastweek / FlashRL
View on GitHub
☆16Mar 24, 2026Updated 4 months ago
YaoJiayi / CacheBlend
View on GitHub
☆200Jul 15, 2025Updated last year
dywsjtu / apparate
View on GitHub
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆24Nov 21, 2024Updated last year
microsoft / chunk-attention
View on GitHub
☆89Apr 18, 2025Updated last year
zyqCSL / DiffKV
View on GitHub
☆45Oct 11, 2025Updated 9 months ago
Leo9660 / HedraRAG_AE
View on GitHub
Artifact Evaluation for SOSP 2025
☆22Aug 16, 2025Updated 11 months ago
pzs19 / TokenSelect
View on GitHub
☆20Mar 11, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
vivekvar-dl / GSPO-DeepSeek-R1-Distill-Qwen-1.5B
View on GitHub
☆18Mar 15, 2026Updated 4 months ago
SusCom-Lab / ZSMerge
View on GitHub
☆23Sep 24, 2025Updated 10 months ago
ziliuziliu / FaaSGraph
View on GitHub
☆22Mar 2, 2025Updated last year
FFY0 / AdaKV
View on GitHub
The Official Implementation of Ada-KV [NeurIPS 2025]
☆139Nov 26, 2025Updated 8 months ago
All-less / faas-scheduling-benchmark
View on GitHub
A benchmark suite for evaluating FaaS scheduler.
☆23Nov 5, 2022Updated 3 years ago
schencoding / lits
View on GitHub
LITS: An Optimized Learned Index for Strings
☆13Jun 18, 2025Updated last year
Scientific-Computing-Lab / STREAMer
View on GitHub
STREAMer: Benchmarking remote volatile and non-volatile memory bandwidth
☆18Aug 21, 2023Updated 2 years ago
MobiSense / SpecOffload-public
View on GitHub
☆30Feb 3, 2026Updated 5 months ago
bentoml / BentoLMDeploy
View on GitHub
Self-host LLMs with LMDeploy and BentoML
☆22Jul 14, 2026Updated 2 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
pmem / pmem.github.io
View on GitHub
The pmem.io Website
☆17Jan 20, 2026Updated 6 months ago
gunrock / mini
View on GitHub
mini is mini
☆20Jan 19, 2020Updated 6 years ago
john-hewitt / implicit-ins
View on GitHub
Codebase for Instruction Following without Instruction Tuning
☆36Sep 24, 2024Updated last year
google / rago
View on GitHub
☆31Jun 22, 2025Updated last year
zhangir-azerbayev / MetaMath
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
gty111 / gLLM
View on GitHub
An Efficient and Versatile Inference Engine for Distributed LLM Serving
☆66Updated this week
ARM-software / nomali-model
View on GitHub
A simple Mali 6xx/7xx register interface model that doesn't do any rendering.
☆13Jan 29, 2016Updated 10 years ago
tengxiaoliu / LM_skip
View on GitHub
[NeurIPS 2024] Can Language Models Learn to Skip Steps?
☆21Jan 25, 2025Updated last year
NovelQA / novelqa.github.io
View on GitHub
☆27Jun 4, 2026Updated last month
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
thustorage / GustANN
View on GitHub
High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU [SIGMOD'26]
☆30Apr 22, 2026Updated 3 months ago
yichuan-w / MLsys_reading_list
View on GitHub
A record of reading list on some MLsys popular topic
☆25Mar 20, 2025Updated last year
sspec-project / SparseSpec
View on GitHub
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
☆116Dec 2, 2025Updated 7 months ago
BG2BKK / my_benchmark
View on GitHub
benchmark for linux server
☆13Nov 6, 2016Updated 9 years ago
Carrefour / carrefour-runtime
View on GitHub
Carrefour runtime. Uses harwdare counters to decide whether Carrefour needs to be run or not.
☆16Sep 29, 2015Updated 10 years ago
jiwonsong-dev / ReasoningPathCompression
View on GitHub
[NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"
☆32Oct 20, 2025Updated 9 months ago
GoodwillComputingLab / CLITE
View on GitHub
☆10Mar 14, 2020Updated 6 years ago