Multi-LLM / prism-researchLinks
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆27Updated last month
Alternatives and similar repositories for prism-research
Users that are interested in prism-research are comparing it to the libraries listed below
Sorting:
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆183Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆59Updated 10 months ago
- Stateful LLM Serving