pie-project/pie

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pie-project/pie)

pie-project / pie

Pie: Programmable LLM Serving

☆184

Alternatives and similar repositories for pie

Users that are interested in pie are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hopter-project / hopter
View on GitHub
A Rust-based embedded operating system designed to enable memory-safe, robust, and responsive embedded applications.
☆85Apr 14, 2025Updated last year
NetX-lab / Ayo
View on GitHub
[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo
☆75Mar 11, 2026Updated 4 months ago
lsds / Tempo
View on GitHub
Tempo is a system for declarative, efficient, end-to-end compiled dynamic deep learning
☆30Oct 21, 2025Updated 8 months ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
LLMServe / hydraserve
View on GitHub
☆20May 11, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
caoshiyi / artifacts
View on GitHub
☆40Nov 28, 2024Updated last year
chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 7 months ago
efeslab / AgentFlux
View on GitHub
☆20Dec 4, 2025Updated 7 months ago
oliverYoung2001 / UltraAttn
View on GitHub
SC'25 UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling
☆16Aug 14, 2025Updated 11 months ago
alibaba / ServeGen
View on GitHub
A framework for generating realistic LLM serving workloads
☆161May 11, 2026Updated 2 months ago
EfficientMoE / MoE-Infinity
View on GitHub
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆319Jul 6, 2026Updated 2 weeks ago
Hanchenli / vllm-continuum
View on GitHub
Preview Code for Continuum Paper
☆89Jul 13, 2026Updated last week
blitz-serving / blitz-scale
View on GitHub
The official implementation of OSDI'25 paper BlitzScale
☆48Apr 15, 2026Updated 3 months ago
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆511Jan 8, 2026Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
efeslab / Nanoflow
View on GitHub
A throughput-oriented high-performance serving framework for LLMs
☆968Mar 29, 2026Updated 3 months ago
microsoft / ParrotServe
View on GitHub
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆222Sep 21, 2024Updated last year
scale-snu / layered-prefill
View on GitHub
Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall fre…
☆18Mar 9, 2026Updated 4 months ago
verl-project / vexact
View on GitHub
verl Zero-Mismatch Dense/MoE HuggingFace Rollout
☆61Updated this week
AmberLJC / LLMSys-PaperList
View on GitHub
Large Language Model (LLM) Systems Paper List
☆2,195Updated this week
zyqCSL / DiffKV
View on GitHub
☆44Oct 11, 2025Updated 9 months ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
casys-kaist / casys-kaist.github.io
View on GitHub
☆39Jun 4, 2026Updated last month
thustorage / Medusa
View on GitHub
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆47May 13, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
HPMLL / BurstGPT
View on GitHub
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆279Jun 30, 2026Updated 3 weeks ago
madsys-dev / smart
View on GitHub
Scaling Up Memory Disaggregated Applications with SMART
☆35Apr 23, 2024Updated 2 years ago
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆10Aug 19, 2025Updated 11 months ago
mit-han-lab / fastrl
View on GitHub
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆174Feb 27, 2026Updated 4 months ago
ovg-project / kvcached
View on GitHub
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆1,106Updated this week
Thesys-lab / Helix-ASPLOS25
View on GitHub
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
☆93Oct 15, 2025Updated 9 months ago
uw-syfi / vibesys
View on GitHub
Can AI Agents Build Bespoke Systems?
☆84Updated this week
ruipeterpan / marconi
View on GitHub
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
☆63Mar 5, 2025Updated last year
QLM-project / QLM
View on GitHub
☆32Jan 16, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
DerekHJH / epic
View on GitHub
☆21Jul 13, 2026Updated last week
eth-easl / sailor
View on GitHub
AI model training on heterogeneous, geo-distributed resources
☆46Nov 24, 2025Updated 7 months ago
wongsingfo / paper-util
View on GitHub
Utilities for paper writing.
☆12Jan 11, 2026Updated 6 months ago
vox-serve / vox-serve
View on GitHub
A Streaming-Native Serving Engine for TTS/STS Models
☆74Jun 20, 2026Updated last month
xlab-uiuc / emt
View on GitHub
EMT: An OS Framework for New Memory Translation Architectures
☆36Jul 22, 2025Updated 11 months ago
yonsei-sslab / asgard
View on GitHub
The artifact for NDSS '25 paper "ASGARD: Protecting On-Device Deep Neural Networks with Virtualization-Based Trusted Execution Environmen…
☆16Oct 16, 2025Updated 9 months ago
RC4ML / RPCNIC
View on GitHub
RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]
☆15Dec 9, 2024Updated last year