BBuf/AI-Infra-Auto-Driven-SKILLS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BBuf/AI-Infra-Auto-Driven-SKILLS)

BBuf / AI-Infra-Auto-Driven-SKILLS

☆690

Alternatives and similar repositories for AI-Infra-Auto-Driven-SKILLS

Users that are interested in AI-Infra-Auto-Driven-SKILLS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

BBuf / KDA-Pilot
View on GitHub
☆229Updated this week
mit-han-lab / kernel-design-agents
View on GitHub
☆754Jun 2, 2026Updated last month
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,573Updated this week
PolyArch / humanize
View on GitHub
From Automated Idea Factory to Realization
☆1,312Updated this week
TongmingLAIC / AKO4ALL
View on GitHub
Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language
☆322May 31, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Tencent / hpc-ops
View on GitHub
High Performance LLM Inference Operator Library
☆1,036Jul 2, 2026Updated 2 weeks ago
mit-han-lab / KernelWiki
View on GitHub
☆309Jun 9, 2026Updated last month
KernelFlow-ops / cuda-optimized-skill
View on GitHub
A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It …
☆190Apr 22, 2026Updated 2 months ago
lightseekorg / tokenspeed
View on GitHub
TokenSpeed is a speed-of-light LLM inference engine.
☆1,629Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,983Updated this week
sgl-project / SpecForge
View on GitHub
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆996Updated this week
zhaochenyang20 / Awesome-ML-SYS-Tutorial
View on GitHub
My learning notes for ML SYS.
☆6,743Updated this week
yzlnew / infra-skills
View on GitHub
A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…
☆139Jul 9, 2026Updated last week
sgl-project / sglang-omni
View on GitHub
SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
☆651Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
sablin39 / tilelang-cuda-skills
View on GitHub
Skills for writing tilelang and debugging with CUDA toolkits.
☆131May 20, 2026Updated 2 months ago
slowlyC / agent-gpu-skills
View on GitHub
☆148Jun 6, 2026Updated last month
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆3,141Updated this week
CalvinXKY / InfraTech
View on GitHub
分享AI Infra知识&代码练习：PyTorch、vLLM/SGLang、slime/vime框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
☆2,975Jul 2, 2026Updated 2 weeks ago
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
HarryWu99 / funny_cute
View on GitHub
Some funny cute/cuteDSL code snippets
☆33Mar 2, 2026Updated 4 months ago
xlite-dev / LeetCUDA
View on GitHub
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆11,578Updated this week
dsl-learn / cutile-learn
View on GitHub
NVIDIA cuTile learn
☆169Dec 9, 2025Updated 7 months ago
mit-han-lab / ncu-report-skill
View on GitHub
☆156May 24, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
radixark / miles
View on GitHub
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
☆1,754Updated this week
sgl-project / mini-sglang
View on GitHub
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
☆4,598May 17, 2026Updated 2 months ago
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,667Updated this week
fzyzcjy / torch_utils
View on GitHub
Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…
☆114Sep 11, 2025Updated 10 months ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
deepseek-ai / TileKernels
View on GitHub
A kernel library written in tilelang
☆1,642Apr 23, 2026Updated 2 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,493Jul 11, 2026Updated last week
Dogacel / auto-gpu-kernel
View on GitHub
Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average…
☆148Jun 10, 2026Updated last month
leepoly / sm-profiler
View on GitHub
☆82Feb 5, 2026Updated 5 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆731Jul 4, 2026Updated 2 weeks ago
KuangjuX / ncu-cli
View on GitHub
Automated CUDA kernel performance diagnostics from NVIDIA Nsight Compute (NCU) CSV exports.
☆34Mar 18, 2026Updated 4 months ago
QwenLM / FlashQLA
View on GitHub
high-performance linear attention kernel library built on TileLang
☆597Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,909Updated this week
vipshop / cache-dit
View on GitHub
A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.
☆1,232Updated this week
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆590Nov 7, 2025Updated 8 months ago
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 5 months ago