Prefix-Aware Attention for LLM Decoding
☆30Jan 23, 2026Updated last month
Alternatives and similar repositories for PAT
Users that are interested in PAT are comparing it to the libraries listed below
Sorting:
- An Open-Source RAG Workload Trace to Optimize RAG Serving Systems☆35Nov 18, 2025Updated 4 months ago
- ☆14Feb 13, 2026Updated last month
- ☆12Dec 1, 2023Updated 2 years ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- A simple demo for using Sentinel with Spring Cloud Alibaba☆16Nov 8, 2018Updated 7 years ago
- 北京邮电大学网络工程嵌入式系统实验报告☆12Jan 7, 2021Updated 5 years ago
- Important experiments on memory management, file access, network transfer, job scheduler, and so on.☆15Apr 27, 2022Updated 3 years ago
- BUPT神经网络与深度学习课设☆10Dec 29, 2023Updated 2 years ago
- [AFK] Hardware router in Chisel (THU Network Joint Lab 2020)☆14Oct 8, 2020Updated 5 years ago
- alibaba/Sentinel zuul integration sample☆11Oct 20, 2018Updated 7 years ago
- 波普特酒店空调管理系统☆14Jun 14, 2020Updated 5 years ago
- bupt nlp第二次作业:分别基于SVD分解以及基于SGNS两种方法构建汉语子词向量并进行评测☆10May 16, 2023Updated 2 years ago
- ☆93Jan 22, 2026Updated last month
- ☆20Jun 1, 2025Updated 9 months ago
- Reading seminar in Harvard Cloud Networking and Systems Group☆16Aug 29, 2022Updated 3 years ago
- 2021级BUPT深度学习与神经网络课程设计源代码☆15Jan 21, 2024Updated 2 years ago
- BUPT Software Engineering Project☆18Aug 20, 2018Updated 7 years ago
- ☆17May 10, 2024Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Feb 29, 2024Updated 2 years ago
- BATCH: Adaptive Batching for Efficient MachineLearning Serving on Serverless Platforms☆11Aug 7, 2021Updated 4 years ago
- ☆36Dec 9, 2025Updated 3 months ago
- COSCon Workshop on ECharts☆18Oct 18, 2018Updated 7 years ago
- Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.☆58Aug 15, 2025Updated 7 months ago
- ☆15Aug 15, 2024Updated last year
- ☆39Oct 11, 2025Updated 5 months ago
- ☆16Apr 22, 2025Updated 10 months ago
- 训练营训练方向项目☆26Jan 28, 2026Updated last month
- This repo is used to assess NSL's scientific research assistants.☆18Jul 7, 2025Updated 8 months ago
- ☆20Jun 3, 2023Updated 2 years ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19May 28, 2024Updated last year
- Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling☆16Sep 27, 2023Updated 2 years ago
- A Streaming-Native Serving Engine for TTS/STS Models☆60Feb 22, 2026Updated 3 weeks ago
- [ICML2024] Adaptive Text Watermark for Large Language Models☆25Dec 11, 2024Updated last year
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆75Sep 15, 2025Updated 6 months ago
- ☆31Updated this week
- AI model training on heterogeneous, geo-distributed resources☆39Nov 24, 2025Updated 3 months ago
- Spring Cloud Alibaba, Dubbo, Alibaba Cloud, and more.☆33Nov 16, 2018Updated 7 years ago
- BytePS examples (Vision, NLP, GAN, etc)☆19Nov 24, 2022Updated 3 years ago
- This is the implementation repository of our SOSP'24 paper: Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value …☆24Oct 20, 2024Updated last year