Multi-LLM/prism-research

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Multi-LLM/prism-research)

Multi-LLM / prism-research

Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.

☆71

Alternatives and similar repositories for prism-research

Users that are interested in prism-research are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ovg-project / GVM
View on GitHub
☆23Jan 18, 2026Updated 6 months ago
ovg-project / kvcached
View on GitHub
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆1,107Updated this week
uclasystem / VQPy
View on GitHub
A language for video analytics
☆12Jan 26, 2023Updated 3 years ago
mit-han-lab / SMEPO
View on GitHub
☆16May 27, 2026Updated last month
Infini-AI-Lab / vortex_torch
View on GitHub
Vortex: Programmable Sparse Attention for Agents as Algorithm Designers
☆68Jun 24, 2026Updated 3 weeks ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
SiriusInfTra / Sirius
View on GitHub
☆18Sep 21, 2025Updated 10 months ago
thustorage / Medusa
View on GitHub
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆47May 13, 2025Updated last year
alibaba / ServeGen
View on GitHub
A framework for generating realistic LLM serving workloads
☆163May 11, 2026Updated 2 months ago
sspec-project / SparseSpec
View on GitHub
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
☆115Dec 2, 2025Updated 7 months ago
leehomyc / SlidesAI
View on GitHub
Generate presentation slides automatically using AI from text, PDFs, or structured content.
☆17Mar 17, 2026Updated 4 months ago
s4-lab-cuhksz / torpor
View on GitHub
☆21May 27, 2025Updated last year
microsoft / vattention
View on GitHub
Dynamic Memory Management for Serving LLMs without PagedAttention
☆504Updated this week
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆57Aug 6, 2025Updated 11 months ago
QLM-project / QLM
View on GitHub
☆32Jan 16, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
uclasystem / Mako
View on GitHub
Mako is a low-pause, high-throughput garbage collector designed for memory-disaggregated datacenters.
☆15Sep 2, 2024Updated last year
DiT-Serving / TetriServe
View on GitHub
[ASPLOS' 26] TetriServe: Efficiently Serving Mixed DiT Workloads
☆17Mar 12, 2026Updated 4 months ago
qiaokang92 / poise
View on GitHub
Poise source code repo
☆12Aug 12, 2020Updated 5 years ago
LoongServe / LoongServe
View on GitHub
☆135Nov 11, 2024Updated last year
rh-aiservices-bu / sardeenz
View on GitHub
Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more…
☆60Jun 9, 2026Updated last month
Edenzzzz / claude-history-sync
View on GitHub
Synchronizing Claude Code conversations across machines
☆16Jul 3, 2026Updated 2 weeks ago
Terra-Flux / PolyRL
View on GitHub
[NSDI'26] PolyRL is a reinforcement learning framework for LLM that harvest spot instances on the cloud to reduce cost.
☆19Mar 30, 2026Updated 3 months ago
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Updated this week
huangyibo / SwiftRDMA
View on GitHub
SwiftRDMA -- Exposing RDMA NIC Resources for Software-Defined RDMA Scheduling
☆53Jun 9, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zejia-lin / BulletServe
View on GitHub
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
☆53Jan 8, 2026Updated 6 months ago
vqpy / vqpy
View on GitHub
VQPy: An object-oriented approach to modern video analytics
☆42Oct 28, 2024Updated last year
wjy99-c / QDiff
View on GitHub
☆10Sep 19, 2021Updated 4 years ago
Infini-AI-Lab / Sparrow
View on GitHub
☆16Jun 15, 2026Updated last month
LMCache / lmcache-agent-trace
View on GitHub
Agent application/benchmark/workload traces should be placed here.
☆15Apr 13, 2026Updated 3 months ago
JiangLiSJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
atomicapple0 / libsmctrl
View on GitHub
Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.
☆67Nov 24, 2025Updated 7 months ago
PDZZXL / Awesome-LLM-Serving
View on GitHub
Large Language Model (LLM) Serving Paper and Resource List
☆29Updated this week
jianuo-huang / Domino
View on GitHub
Official implementation of “Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding”.
☆121Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
HPMLL / ZipServ_ASPLOS26
View on GitHub
☆50Dec 19, 2025Updated 7 months ago
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆210Mar 18, 2026Updated 4 months ago
kvcache-ai / TrEnv-X
View on GitHub
☆95Sep 15, 2025Updated 10 months ago
Sys-KU / DeepPlan
View on GitHub
[ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆56Aug 6, 2025Updated 11 months ago
alibaba-edu / qwen-bailian-usagetraces-anon
View on GitHub
☆150Apr 23, 2026Updated 2 months ago
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆252Jun 21, 2026Updated last month
blitz-serving / blitz-scale
View on GitHub
The official implementation of OSDI'25 paper BlitzScale
☆48Apr 15, 2026Updated 3 months ago