lambda7xx/awesome-AI-system

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lambda7xx/awesome-AI-system)

lambda7xx / awesome-AI-system

paper and its code for AI System

☆374

Alternatives and similar repositories for awesome-AI-system

Users that are interested in awesome-AI-system are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AmberLJC / LLMSys-PaperList
View on GitHub
Large Language Model (LLM) Systems Paper List
☆2,194Updated this week
galeselee / Awesome_LLM_System-PaperList
View on GitHub
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆284Mar 6, 2025Updated last year
AmadeusChan / Awesome-LLM-System-Papers
View on GitHub
☆646Jan 14, 2026Updated 6 months ago
xinhaoc / ferret
View on GitHub
Autonomous CUDA kernel optimization agent with structured task specs and per-config scoring
☆17Jun 17, 2026Updated last month
byungsoo-oh / ml-systems-papers
View on GitHub
Curated collection of papers in machine learning systems
☆632Feb 7, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LLMServe / DistServe
View on GitHub
Disaggregated serving system for Large Language Models (LLMs).
☆827Apr 6, 2025Updated last year
ConnollyLeon / awesome-Auto-Parallelism
View on GitHub
A baseline repository of Auto-Parallelism in Training Neural Networks
☆145Jun 25, 2022Updated 4 years ago
mutinifni / splitwise-sim
View on GitHub
LLM serving cluster simulator
☆157Apr 25, 2024Updated 2 years ago
interestingLSY / swiftLLM
View on GitHub
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆329Jun 10, 2025Updated last year
raywan-110 / AdaQP
View on GitHub
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
☆24Mar 1, 2024Updated 2 years ago
microsoft / vattention
View on GitHub
Dynamic Memory Management for Serving LLMs without PagedAttention
☆504Updated this week
SJTU-IPADS / reef-artifacts
View on GitHub
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆43May 29, 2022Updated 4 years ago
flexflow / flexflow-train
View on GitHub
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,895Jul 1, 2026Updated 2 weeks ago
HPMLL / BurstGPT
View on GitHub
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆278Jun 30, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
merrymercy / awesome-tensor-compilers
View on GitHub
A list of awesome compiler projects and papers for tensor computation and deep learning.
☆2,766Oct 19, 2024Updated last year
efeslab / Atom
View on GitHub
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆343Jul 2, 2024Updated 2 years ago
xlite-dev / Awesome-LLM-Inference
View on GitHub
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
☆5,404Jun 23, 2026Updated 3 weeks ago
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆511Jan 8, 2026Updated 6 months ago
efeslab / Nanoflow
View on GitHub
A throughput-oriented high-performance serving framework for LLMs
☆968Mar 29, 2026Updated 3 months ago
llumnix-project / llumnix-ray
View on GitHub
Efficient and easy multi-instance LLM serving
☆562Mar 12, 2026Updated 4 months ago
bytedance / QSync
View on GitHub
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Feb 23, 2024Updated 2 years ago
microsoft / ParrotServe
View on GitHub
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆222Sep 21, 2024Updated last year
netx-repo / PipeSwitch
View on GitHub
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
☆127May 9, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Infini-AI-Lab / MagicDec
View on GitHub
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆154Dec 4, 2024Updated last year
microsoft / SuperScaler
View on GitHub
An experimental parallel training platform
☆57Mar 25, 2024Updated 2 years ago
Hsword / SpotServe
View on GitHub
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆135Feb 22, 2024Updated 2 years ago
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,909Updated this week
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
LoongServe / LoongServe
View on GitHub
☆135Nov 11, 2024Updated last year
pentium3 / sys_reading
View on GitHub
system paper reading notes
☆262Sep 22, 2025Updated 9 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,493Jul 11, 2026Updated last week
HuaizhengZhang / AI-Infra-from-Zero-to-Hero
View on GitHub
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Mod…
☆4,206Jul 25, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
flashinfer-ai / debug-print
View on GitHub
Debug print operator for cudagraph debugging
☆18Aug 2, 2024Updated last year
mosharaf / eecs598
View on GitHub
Advanced Topics on Systems for X
☆288Jul 10, 2024Updated 2 years ago
MoE-Inf / awesome-moe-inference
View on GitHub
Curated collection of papers in MoE model inference
☆408Mar 12, 2026Updated 4 months ago
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,377Updated this week
horseee / Awesome-Efficient-LLM
View on GitHub
A curated list for Efficient Large Language Models
☆2,023Jun 17, 2025Updated last year
efeslab / fiddler
View on GitHub
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆267Nov 18, 2024Updated last year
Shenggan / awesome-distributed-ml
View on GitHub
A curated list of awesome projects and papers for distributed training or inference
☆280Oct 8, 2024Updated last year