MLSys-Learner-Resources/Awesome-MLSys-Blogger

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MLSys-Learner-Resources/Awesome-MLSys-Blogger)

MLSys-Learner-Resources / Awesome-MLSys-Blogger

The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)

☆341

Alternatives and similar repositories for Awesome-MLSys-Blogger

Users that are interested in Awesome-MLSys-Blogger are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhaochenyang20 / Awesome-ML-SYS-Tutorial
View on GitHub
My learning notes for ML SYS.
☆6,782Updated this week
sustcsonglin / fla-tilelang
View on GitHub
☆37Mar 7, 2025Updated last year
xlite-dev / Awesome-LLM-Inference
View on GitHub
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
☆5,424Updated this week
Yifei-Zuo / Parallax
View on GitHub
Official repository for Parallax (Parameterized Local Linear Attention)
☆67Jul 7, 2026Updated 3 weeks ago
HuaizhengZhang / AI-Infra-from-Zero-to-Hero
View on GitHub
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Mod…
☆4,233Jul 25, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,503Jul 20, 2026Updated last week
AmberLJC / LLMSys-PaperList
View on GitHub
Large Language Model (LLM) Systems Paper List
☆2,204Updated this week
xlite-dev / LeetCUDA
View on GitHub
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆11,655Updated this week
xlite-dev / ffpa-attn
View on GitHub
🤖FFPA: Extends FA-2/3 via Split-D for large headdims, 1.5x~6×↑🎉 vs SDPA, up to 513~535 TFLOPS🎉 on NVIDIA H200.
☆318Updated this week
dsl-learn / cutile-learn
View on GitHub
NVIDIA cuTile learn
☆169Dec 9, 2025Updated 7 months ago
yzlnew / infra-skills
View on GitHub
A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…
☆140Jul 9, 2026Updated 2 weeks ago
sablin39 / tilelang-cuda-skills
View on GitHub
Skills for writing tilelang and debugging with CUDA toolkits.
☆133May 20, 2026Updated 2 months ago
galeselee / Awesome_LLM_System-PaperList
View on GitHub
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆286Mar 6, 2025Updated last year
sonnyli / flash_attention_from_scratch
View on GitHub
Flash Attention from Scratch on CUDA Ampere
☆187Sep 1, 2025Updated 10 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
SiriusNEO / Triton-Puzzles-Lite
View on GitHub
Puzzles for learning Triton, play it with minimal environment configuration!
☆739Mar 17, 2026Updated 4 months ago
RLsys-Foundation / TritonForge
View on GitHub
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆146Nov 10, 2025Updated 8 months ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
lambda7xx / awesome-AI-system
View on GitHub
paper and its code for AI System
☆377May 14, 2026Updated 2 months ago
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆7,007Updated this week
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆3,152Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆6,053Updated this week
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 6 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
gpu-mode / awesomeMLSys
View on GitHub
An ML Systems Onboarding list
☆1,107Feb 19, 2026Updated 5 months ago
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
gpu-mode / lectures
View on GitHub
Material for gpu-mode lectures
☆6,376Jun 15, 2026Updated last month
inclusionAI / cuLA
View on GitHub
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
☆535Updated this week
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆243Jan 20, 2026Updated 6 months ago
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
mit-han-lab / flash-moba
View on GitHub
☆251Nov 19, 2025Updated 8 months ago
tile-ai / TileFoundry
View on GitHub
☆55Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
mit-han-lab / KernelWiki
View on GitHub
☆314Jun 9, 2026Updated last month
tile-ai / tilelang-puzzles
View on GitHub
Learning TileLang with 10 puzzles!
☆352May 28, 2026Updated 2 months ago
NVIDIA / nvshmem
View on GitHub
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆566Jul 20, 2026Updated last week
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
CaucherWang / Steiner-hardness
View on GitHub
A new query hardness measure for graph-based ANN indexes. Build unbiased workloads with this hardness to see the actual performance of yo…
☆22May 6, 2026Updated 2 months ago
flashinfer-ai / flashinfer-bench-starter-kit
View on GitHub
FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels
☆178Apr 26, 2026Updated 3 months ago
toyaix / tritonllm
View on GitHub
LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model
☆119Apr 28, 2026Updated 3 months ago