flashserve/PAT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/flashserve/PAT)

flashserve / PAT

Prefix-Aware Attention for LLM Decoding

☆41

Alternatives and similar repositories for PAT

Users that are interested in PAT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

flashserve / RAGPulse
View on GitHub
An Open-Source RAG Workload Trace to Optimize RAG Serving Systems
☆37Nov 18, 2025Updated 8 months ago
TJU-NSL / awesome-papers
View on GitHub
☆37Updated this week
llumnix-project / llumnix-kv
View on GitHub
☆33Jun 15, 2026Updated last month
Leo9660 / HedraRAG_AE
View on GitHub
Artifact Evaluation for SOSP 2025
☆21Aug 16, 2025Updated 11 months ago
alibaba-edu / qwen-bailian-usagetraces-anon
View on GitHub
☆150Apr 23, 2026Updated 2 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
sspec-project / SparseSpec
View on GitHub
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
☆115Dec 2, 2025Updated 7 months ago
vortexgpgpu / Volt
View on GitHub
☆17Feb 9, 2026Updated 5 months ago
HPMLL / ZipServ_ASPLOS26
View on GitHub
☆50Dec 19, 2025Updated 7 months ago
zejia-lin / BulletServe
View on GitHub
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
☆53Jan 8, 2026Updated 6 months ago
flashserve / flash-linear-attention-npu
View on GitHub
☆27Updated this week
Sys-KU / DSA-Linux
View on GitHub
[IEEE CAL 2025] Accelerating Page Migrations in Operating Systems with Intel DSA
☆16Nov 20, 2024Updated last year
DerekHJH / epic
View on GitHub
☆21Jul 13, 2026Updated last week
platformxlab / RAGPerf
View on GitHub
An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems
☆29Mar 13, 2026Updated 4 months ago
blitz-serving / trace-replayer
View on GitHub
Repo to replay Qwen trace
☆31Jan 9, 2026Updated 6 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
google / iopddl
View on GitHub
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆25May 12, 2025Updated last year
ovg-project / GVM
View on GitHub
☆23Jan 18, 2026Updated 6 months ago
TJU-NSL / NSL-test
View on GitHub
This repo is used to assess NSL's scientific research assistants.
☆18Jul 7, 2025Updated last year
xinhaoc / ferret
View on GitHub
Autonomous CUDA kernel optimization agent with structured task specs and per-config scoring
☆17Jun 17, 2026Updated last month
Deep-Learning-Profiling-Tools / fasten
View on GitHub
☆14Apr 24, 2024Updated 2 years ago
Sys-KU / LMServe
View on GitHub
A lightweight and fast LLM serving framework
☆15Mar 5, 2026Updated 4 months ago
Raphael-Hao / Abacus
View on GitHub
☆38Jun 27, 2025Updated last year
PanZaifeng / FastTree-Artifact
View on GitHub
☆32Mar 24, 2025Updated last year
redbird-arch / isca2025-chimera-artifact
View on GitHub
Artifact of Chimera
☆18May 6, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AIS-SNU / GraNNDis_Artifact
View on GitHub
[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…
☆10Aug 13, 2024Updated last year
zhixin612 / awesome-papers-LMsys
View on GitHub
Daily Arxiv Papers on LLM Systems
☆66Updated this week
escalab / RTSpMSpM
View on GitHub
☆25Apr 13, 2025Updated last year
mlsys-io / helium_demo
View on GitHub
☆23May 2, 2026Updated 2 months ago
LMCache / lmcache-agent-trace
View on GitHub
Agent application/benchmark/workload traces should be placed here.
☆15Apr 13, 2026Updated 3 months ago
readwrite112 / AGAThA
View on GitHub
PPoPP24 AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
☆22May 8, 2024Updated 2 years ago
illinois-nsai / dede
View on GitHub
DeDe (OSDI '25): an optimization framework for large-scale resource allocation
☆15May 18, 2026Updated 2 months ago
dywsjtu / apparate
View on GitHub
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆24Nov 21, 2024Updated last year
OSU-STARLAB / UVM_benchmark
View on GitHub
☆34Sep 9, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Multi-LLM / prism-research
View on GitHub
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆71Mar 17, 2026Updated 4 months ago
sail-sg / odc
View on GitHub
On demand communication
☆34Apr 16, 2026Updated 3 months ago
microsoft / vattention
View on GitHub
Dynamic Memory Management for Serving LLMs without PagedAttention
☆504Updated this week
s3yonsei / blocked_samples
View on GitHub
☆43Sep 3, 2025Updated 10 months ago
csl-iisc / SUV-MICRO24
View on GitHub
☆13Oct 6, 2024Updated last year
scitix / SiMM
View on GitHub
SiMM: Scalable in-Memory Middleware
☆41Apr 20, 2026Updated 3 months ago
Kami-code / ICS-2020-Notes
View on GitHub
上海交通大学软件学院课程计算机系统基础（ICS）笔记
☆15Feb 7, 2022Updated 4 years ago