vbdi/epdserve

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vbdi/epdserve)

vbdi / epdserve

[ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation

☆24

Alternatives and similar repositories for epdserve

Users that are interested in epdserve are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hpdps-group / ElasticMM
View on GitHub
ElasticMM: Elastic and Efficient MLLM Serving System
☆44May 10, 2026Updated 2 months ago
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆57Aug 6, 2025Updated 11 months ago
gofreelee / SpaceServe
View on GitHub
☆31Jul 13, 2026Updated last week
pittisl / mPnP-LLM
View on GitHub
Code for paper "Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI"
☆13Jan 19, 2024Updated 2 years ago
modelscope / Katz
View on GitHub
[ATC'25] Katz is a high-performance serving system designed specifically for diffusion model workflows with multiple adapters.
☆24May 26, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
WukLab / preble
View on GitHub
Stateful LLM Serving
☆105Mar 11, 2025Updated last year
JiangLiSJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
alibaba-damo-academy / Inferix
View on GitHub
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
☆133Apr 28, 2026Updated 2 months ago
oscomp / proj47-tee-os
View on GitHub
面向可信执行环境的OS。
☆12May 9, 2025Updated last year
alibaba / llm-scheduling-artifact
View on GitHub
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Jun 5, 2024Updated 2 years ago
UChi-JCL / CacheGen
View on GitHub
☆169Oct 9, 2024Updated last year
oliverYoung2001 / UltraAttn
View on GitHub
SC'25 UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling
☆16Aug 14, 2025Updated 11 months ago
shengshu-ai / TurboServe
View on GitHub
TurboServe: Serving Streaming Video Generation Efficiently and Economically
☆34Jul 12, 2026Updated last week
vbdi / divprune
View on GitHub
[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
☆86Apr 16, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
heheda12345 / Jenga-SOSP25-AE
View on GitHub
☆15Aug 16, 2025Updated 11 months ago
AIS-SNU / GraNNDis_Artifact
View on GitHub
[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…
☆10Aug 13, 2024Updated last year
hao-ai-lab / vllm-ltr
View on GitHub
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆81Nov 4, 2024Updated last year
Ying1123 / llm-caching-multiplexing
View on GitHub
☆19Jun 3, 2023Updated 3 years ago
knpwrs / Multi-Threaded-In-Place-QuickSort
View on GitHub
☆10Jul 31, 2019Updated 6 years ago
Multi-LLM / prism-research
View on GitHub
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆71Mar 17, 2026Updated 4 months ago
cornserve-ai / cornserve
View on GitHub
Easy, Fast, and Scalable Multimodal AI
☆128Jun 2, 2026Updated last month
microsoft / tokenweave
View on GitHub
Accepted to MLSys 2026
☆91Apr 19, 2026Updated 3 months ago
vllm-project / vllm-daily
View on GitHub
vLLM Daily Summarization of Merged PRs
☆51Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
LoongServe / LoongServe
View on GitHub
☆135Nov 11, 2024Updated last year
DerekHJH / epic
View on GitHub
☆21Jul 13, 2026Updated last week
sjtu-zhao-lab / ParaStep
View on GitHub
Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism (NIPS'25)
☆16Oct 6, 2025Updated 9 months ago
SNU-ARC / any-precision-llm
View on GitHub
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆130Jul 4, 2025Updated last year
mi150 / VaLoRA
View on GitHub
☆11May 19, 2025Updated last year
yuanmu97 / PacketGame
View on GitHub
[SIGCOMM 2023] PacketGame: Multi-Stream Packet Gating for Concurrent Video Inference at Scale
☆15Jul 1, 2023Updated 3 years ago
Qiukunpeng / ADC
View on GitHub
[MICCAI 2025] Adaptively Distilled ControlNet: Accelerated Training and Superior Sampling for Medical Image
☆15Jan 9, 2026Updated 6 months ago
alibaba / ServeGen
View on GitHub
A framework for generating realistic LLM serving workloads
☆163May 11, 2026Updated 2 months ago
tonyzhao-jt / LLM-PQ
View on GitHub
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆39Aug 29, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
itmare / alluxio
View on GitHub
내맘대로 alluxio 정리중
☆11May 13, 2019Updated 7 years ago
ParCIS / Ok-Topk
View on GitHub
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆27Dec 10, 2022Updated 3 years ago
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Updated this week
timlee0212 / SiDA-MoE
View on GitHub
Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"
☆22Apr 13, 2024Updated 2 years ago
xlite-dev / Awesome-DiT-Inference
View on GitHub
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
☆578Jun 13, 2026Updated last month
startupheroes / startupheroes-checkstyle
View on GitHub
StartupHeroes Checkstyle project with additional Checkstyle checks and Sonar Checkstyle plugin
☆10Jan 25, 2024Updated 2 years ago
federerjiang / Plato
View on GitHub
Plato is a system for viewport adaptation based bitrate adaptive VR video streaming.
☆15May 1, 2018Updated 8 years ago