Ketonmi/Awesome-Large-Scale-LLM-Serving

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Ketonmi/Awesome-Large-Scale-LLM-Serving)

Ketonmi / Awesome-Large-Scale-LLM-Serving

Must-read papers on improving efficiency for LLM serving clusters

☆34

Alternatives and similar repositories for Awesome-Large-Scale-LLM-Serving

Users that are interested in Awesome-Large-Scale-LLM-Serving are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

antgroup / cakekv
View on GitHub
☆39Mar 17, 2025Updated last year
OrderLab / awesome-machine-learning-reliability
View on GitHub
A curated reading list for machine learning reliability research and practice
☆31Sep 18, 2025Updated 10 months ago
James-QiuHaoran / LLM-serving-with-proxy-models
View on GitHub
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …
☆52Jun 1, 2024Updated 2 years ago
Ledengary / CCPS
View on GitHub
Calibrating LLM Confidence by Probing Perturbed Representation Stability
☆19Jul 5, 2025Updated last year
LedgeDash / unum
View on GitHub
☆12Oct 16, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
self-checker / SelfChecker
View on GitHub
ICSE2021 Submission
☆13Aug 28, 2022Updated 3 years ago
yuwvandy / TDGNN
View on GitHub
☆20Aug 26, 2021Updated 4 years ago
junhongmit / P-and-B
View on GitHub
🧠Plan-and-Budget: Training-free test-time reasoning framework for adaptive token allocation in large language models (ICLR 2026).
☆15Mar 2, 2026Updated 4 months ago
abskl / GPT-Academic
View on GitHub
为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型…
☆15Feb 18, 2024Updated 2 years ago
ranchlai / nlpcc2023-shared-task-diaASQ
View on GitHub
NLPCC2023 shared-task DiaASQ first-place solution. （NLPCC2023对话式细粒度情感识别大赛第一名方案）
☆15Jun 21, 2023Updated 3 years ago
chengruogu0915 / GeoUni
View on GitHub
Repository for GeoUni, A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions.
☆23Jun 12, 2025Updated last year
alihassanijr / TorchKMeans
View on GitHub
A torch-based implementation of K-Means and K-Means++
☆17Dec 6, 2020Updated 5 years ago
CuewarsTaner / TFT
View on GitHub
【云顶之弈小帮手】TFT-Helper
☆15Jan 6, 2020Updated 6 years ago
tengxiaoliu / LM_skip
View on GitHub
[NeurIPS 2024] Can Language Models Learn to Skip Steps?
☆21Jan 25, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
runchu-tian / LongPiBench
View on GitHub
The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"
☆14Dec 16, 2024Updated last year
NEO-MLSys25 / NEO
View on GitHub
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
☆100Jun 16, 2025Updated last year
sampsyo / bibble
View on GitHub
another BibTeX-to-HTML (with Jinja2 templates)
☆17Jun 23, 2020Updated 6 years ago
gerkone / painn-jax
View on GitHub
PaiNN in jax
☆11Jan 14, 2025Updated last year
oconradh / benchmark_aflow
View on GitHub
Benchmark AFLOW Data Sets for Machine Learning doi.org/10.1007/s40192-020-00174-4
☆11Aug 29, 2020Updated 5 years ago
RimoChan / elec-pipe
View on GitHub
电动笛子！
☆12Jul 13, 2024Updated 2 years ago
OrderLab / xinda
View on GitHub
Automated Testing and Adaptive Detection of **Slow Faults** in Distributed Systems
☆19Mar 6, 2025Updated last year
OpenSparseLLMs / Open-Pandora
View on GitHub
Open-Pandora: On-the-fly Control Video Generation
☆35Nov 28, 2024Updated last year
ChandlerGuan / kperfir_artifact
View on GitHub
☆19May 9, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
microsoft / TrainVerify
View on GitHub
A verification tool for ensuring parallelization equivalence in distributed model training.
☆18Sep 1, 2025Updated 11 months ago
UNITES-Lab / AgentSymbiotic
View on GitHub
☆14Mar 11, 2025Updated last year
nishadsinghi / sc-genrm-scaling
View on GitHub
[COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…
☆15Oct 31, 2025Updated 9 months ago
OrderLab / TrainCheck
View on GitHub
An Observability Framework for AI Training
☆73Jul 16, 2026Updated 2 weeks ago
OrderLab / ePass
View on GitHub
A compiler framework for eBPF programs
☆21Jul 11, 2026Updated 3 weeks ago
efeslab / ConsumerBench
View on GitHub
A benchmarking framework for on-device AI
☆19Updated this week
megagonlabs / holobench
View on GitHub
🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.…
☆12Feb 25, 2025Updated last year
yuwvandy / DPGNN
View on GitHub
☆38Jun 3, 2023Updated 3 years ago
JungHoyoun / PromptCompressor
View on GitHub
☆12Apr 29, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LINs-lab / ELICIT
View on GitHub
[ICLR 2025] ELICIT: LLM Augmentation Via External In-context Capability
☆14Mar 11, 2025Updated last year
yuwvandy / G2GNN
View on GitHub
☆45Apr 2, 2023Updated 3 years ago
Wimnet / webrtc_performance
View on GitHub
Code for evaluating WebRTC performance
☆14Sep 18, 2017Updated 8 years ago
Xnhyacinth / NesyCD
View on GitHub
[AAAI 2025] Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks
☆12Jun 19, 2025Updated last year
DiT-Serving / TetriServe
View on GitHub
[ASPLOS' 26] TetriServe: Efficiently Serving Mixed DiT Workloads
☆17Mar 12, 2026Updated 4 months ago
huangyuxiang03 / Locret
View on GitHub
☆14Oct 3, 2024Updated last year
bpwu1 / confidence-regulation-neurons
View on GitHub
Confidence Regulation Neurons in Language Models (NeurIPS 2024)
☆15Feb 1, 2025Updated last year