Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆58Aug 15, 2025Updated 6 months ago
Alternatives and similar repositories for prism-research
Users that are interested in prism-research are comparing it to the libraries listed below
Sorting:
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- Mako is a low-pause, high-throughput garbage collector designed for memory-disaggregated datacenters.☆15Sep 2, 2024Updated last year
- A framework for generating realistic LLM serving workloads☆103Oct 9, 2025Updated 4 months ago
- ☆10Sep 19, 2021Updated 4 years ago
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆796Updated this week
- A language for video analytics☆12Jan 26, 2023Updated 3 years ago
- ☆44Updated this week
- An Open-Source RAG Workload Trace to Optimize RAG Serving Systems☆35Nov 18, 2025Updated 3 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated 9 months ago
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…☆51Oct 11, 2025Updated 4 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆127Nov 10, 2025Updated 3 months ago
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated last month
- ☆36Jan 21, 2021Updated 5 years ago
- Repository for go shared libraries (for now).☆11Dec 1, 2025Updated 3 months ago
- ☆28Dec 3, 2025Updated 3 months ago
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- [ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression☆34Aug 7, 2025Updated 6 months ago
- ☆74Sep 15, 2025Updated 5 months ago
- Dorylus: Affordable, Scalable, and Accurate GNN Training☆76May 31, 2021Updated 4 years ago
- ☆79Feb 10, 2026Updated 3 weeks ago
- ☆17May 27, 2025Updated 9 months ago
- This is the code of a agentic rag method with dynamic workflow.☆12Jan 22, 2026Updated last month
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- A simple MIPS CPU for BUAA CO course (and now NSCSCC).☆10May 15, 2021Updated 4 years ago
- Code for the paper "Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching" (COLING 2025)☆19Jan 3, 2026Updated last month
- A distributed stream querying engine that provides sub-millisecond stateful query at millions of queries per-second over fast-evolving li…☆10Jul 18, 2018Updated 7 years ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆48May 10, 2024Updated last year
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆46Jun 11, 2025Updated 8 months ago
- VQPy: An object-oriented approach to modern video analytics☆41Oct 28, 2024Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Dec 25, 2025Updated 2 months ago
- ☆18Jun 6, 2025Updated 8 months ago
- Speeding Up Your Python Codes 1000x☆12Apr 2, 2025Updated 11 months ago
- Nu is a new datacenter system that enables developers to build fungible applications that can use datacenter resources wherever they are.☆41May 14, 2024Updated last year
- The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions (EMNLP 2023))☆13Dec 21, 2023Updated 2 years ago
- ACM Class 2017 Computer Architecture☆10Jan 11, 2018Updated 8 years ago
- A unified programming framework for high and portable performance across FPGAs and GPUs☆11Mar 23, 2025Updated 11 months ago
- Midas is a memory management system that efficiently and safely harvests idle memory for applications' soft state.☆10Oct 30, 2024Updated last year
- ☆13Jan 7, 2025Updated last year
- A simple OperatingSystem☆10Sep 9, 2022Updated 3 years ago