通过实验对比LLM推理中Prefill和Decoding阶段的吞吐量差异,揭示性能瓶颈,解释PD分离优化技术的原理。包含CUDA和Apple MPS (M系列芯片) 的测试脚本。
☆20May 22, 2025Updated 9 months ago
Alternatives and similar repositories for LLM-Prefill-Decode-Benchmark
Users that are interested in LLM-Prefill-Decode-Benchmark are comparing it to the libraries listed below
Sorting:
- Some microbenchmarks and design docs before commencement☆12Feb 1, 2021Updated 5 years ago
- A Texas Holdem poker framework written in C++ 20.☆11Apr 23, 2023Updated 2 years ago
- 🌿快速生成文件夹目录结构,支持定义目录层级,支持生成到 markdown 文件。☆13Oct 19, 2022Updated 3 years ago
- ☆12Jul 24, 2024Updated last year
- A QA system based on k8s-specific knowledge build on ChatGLM2-6B, serving by Ray.☆10Sep 14, 2023Updated 2 years ago
- ☆12Mar 31, 2021Updated 4 years ago
- ☆13Jan 7, 2025Updated last year
- nd009-cn-advanced-p5,针对Udacity CN MLND P5项目☆14Jun 27, 2022Updated 3 years ago
- An implementation of the AlphaZero algorithm for adversarial games to be used with the machine learning framework of your choice☆12Aug 30, 2020Updated 5 years ago
- Learn how to use Shiboken2 with your own custom Qt based library☆11Nov 3, 2021Updated 4 years ago
- 🚀 LLM inference optimization simulator, modeling compute-bound prefill and memory-bound decode phases.☆13Jul 12, 2025Updated 7 months ago
- Convex Formulation of Multiple Instance Learning from Positive and Unlabeled Bags☆10Apr 28, 2018Updated 7 years ago
- ☆13Sep 8, 2024Updated last year
- [AAAI 2026] AutoTool: Efficient Tool Selection for Large Language Model Agents☆29Dec 28, 2025Updated 2 months ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- CFR-based Texas Hold'em AI☆11Jan 30, 2021Updated 5 years ago
- Swarm learning algorithm☆11Jun 2, 2021Updated 4 years ago
- Simple starter CMake project that uses NVBench.☆15May 6, 2025Updated 9 months ago
- ☆16Jul 13, 2022Updated 3 years ago
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 3 years ago
- An Empirical Study of Memorization in NLP (ACL 2022)☆13Jun 22, 2022Updated 3 years ago
- Tuning the PI controller parameters by using a contextual bandit approach☆15Jan 13, 2022Updated 4 years ago
- Reinforcement learning training project for a SLG game☆13Dec 21, 2017Updated 8 years ago
- Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo☆11Feb 12, 2023Updated 3 years ago
- Official PyTorch implementation for the ICML 2023 paper "Out-of-Distribution Generalization of Federated Learning via Implicit Invariant …☆13Oct 31, 2023Updated 2 years ago
- Code from the CMU LM inference fall 2025 edition.☆34Dec 7, 2025Updated 2 months ago
- https://bbuf.github.io/gpu-glossary-zh/☆26Nov 7, 2025Updated 3 months ago
- A Simple Game Using Unity ML-Agents☆10Nov 20, 2020Updated 5 years ago
- 一款为拼多多联盟,推广的小程序☆12Nov 30, 2018Updated 7 years ago
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 12 years ago
- Source code and data for the EDM 2022 paper☆12May 16, 2022Updated 3 years ago
- Implementation of elo rating for large competitions☆10Nov 25, 2016Updated 9 years ago
- ☆12Oct 18, 2019Updated 6 years ago
- Inline PTX Assembly in CUDA example☆13May 7, 2022Updated 3 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Feb 11, 2026Updated 3 weeks ago
- Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking☆13Feb 5, 2023Updated 3 years ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆18Feb 29, 2024Updated 2 years ago
- Lecture notes for a course on Decision and Game Theory for undergraduates studying AI☆13Dec 14, 2018Updated 7 years ago
- An intelligent voice transcription input tool supporting multiple transcription services and high-quality speech recognition features.☆33Updated this week