James-QiuHaoran / LLM-serving-with-proxy-modelsLinks

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)

☆46

Alternatives and similar repositories for LLM-serving-with-proxy-models

Users that are interested in LLM-serving-with-proxy-models are comparing it to the libraries listed below

Sorting:

HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆213Updated 2 months ago
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆129Updated last year
Thesys-lab / Helix-ASPLOS25
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆67Updated last week
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆155Updated 3 weeks ago
WukLab / preble
Stateful LLM Serving
☆86Updated 7 months ago
pkusys / ElasticFlow
Artifacts for our ASPLOS'23 paper ElasticFlow
☆53Updated last year
LLMServe / dLoRA-artifact
☆26Updated last year
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆62Updated last year
NetX-lab / Ayo
[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo
☆47Updated 2 months ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆116Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆88Updated 2 years ago
msr-fiddle / blox
☆43Updated last year
S-Lab-System-Group / Lucid
Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs
☆55Updated 2 years ago
Ying1123 / VTC-artifact
☆34Updated last year
SymbioticLab / Oobleck
A resilient distributed training framework
☆95Updated last year
YaoJiayi / CacheBlend
☆141Updated 3 months ago
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆126Updated last week
TankLabTJU / INFless
The source code of INFless，a native serverless platform for AI inference.
☆41Updated 3 years ago
msr-fiddle / synergy
☆51Updated 2 years ago
UMass-LIDS / Proteus
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
☆13Updated last year
JF-D / Parcae
☆21Updated last year
hao-ai-lab / MuxServe
☆74Updated this week
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆147Updated 8 months ago
kungfu-team / tenplex
Dynamic resources changes for multi-dimensional parallelism training
☆29Updated 2 months ago
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
Hsword / Awesome-Machine-Learning-System-Papers
☆77Updated 3 years ago
NEO-MLSys25 / NEO
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
☆67Updated 4 months ago
LoongServe / LoongServe
☆124Updated 11 months ago
uclasystem / bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆51Updated 2 years ago
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated last year