BytedTsinghua-SIA/CUDA-Agent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BytedTsinghua-SIA/CUDA-Agent)

BytedTsinghua-SIA / CUDA-Agent

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

☆1,114

Alternatives and similar repositories for CUDA-Agent

Users that are interested in CUDA-Agent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ScalingIntelligence / KernelBench
View on GitHub
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
☆1,148Mar 24, 2026Updated 3 months ago
meta-pytorch / KernelAgent
View on GitHub
Autonomous GPU Kernel Generation & Optimization via Deep Agents
☆486Updated this week
sablin39 / tilelang-cuda-skills
View on GitHub
Skills for writing tilelang and debugging with CUDA toolkits.
☆131May 20, 2026Updated 2 months ago
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,988Updated this week
kcxain / Awesome-LLM4Kernel
View on GitHub
LLM4Kernel: A Survey of Large Language Models for GPU Kernel Development
☆76Mar 31, 2026Updated 3 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,674Updated this week
mit-han-lab / kernel-design-agents
View on GitHub
☆754Jun 2, 2026Updated last month
KernelFlow-ops / cuda-optimized-skill
View on GitHub
A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It …
☆191Apr 22, 2026Updated 2 months ago
flagos-ai / awesome-LLM-driven-kernel-generation
View on GitHub
Review automated kernel generation in the era of LLMs
☆273Jun 25, 2026Updated 3 weeks ago
hkust-nlp / KernelGYM
View on GitHub
[KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations [ICML…
☆193Mar 29, 2026Updated 3 months ago
deepseek-ai / TileKernels
View on GitHub
A kernel library written in tilelang
☆1,643Apr 23, 2026Updated 2 months ago
caoshiyi / K-Search
View on GitHub
Automated High-Performance GPU Kernel Generation
☆120Jun 1, 2026Updated last month
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
RightNow-AI / autokernel
View on GitHub
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
☆1,469Mar 19, 2026Updated 4 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
mit-han-lab / KernelWiki
View on GitHub
☆310Jun 9, 2026Updated last month
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,494Updated this week
Tencent / hpc-ops
View on GitHub
High Performance LLM Inference Operator Library
☆1,041Updated this week
xlite-dev / LeetCUDA
View on GitHub
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆11,578Updated this week
NVIDIA / SOL-ExecBench
View on GitHub
A benchmark of real-world DL kernel problems
☆257Updated this week
technillogue / ptx-isa-markdown
View on GitHub
PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
☆215Dec 24, 2025Updated 6 months ago
QwenLM / FlashQLA
View on GitHub
high-performance linear attention kernel library built on TileLang
☆597Updated this week
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,376Updated this week
NVIDIA / cutile-python
View on GitHub
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
☆2,121Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 2 weeks ago
OptimAI-Lab / CudaForge
View on GitHub
Official Repo of CudaForge
☆84Dec 2, 2025Updated 7 months ago
NVIDIA / TileGym
View on GitHub
Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming
☆776Updated this week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,063Updated this week
mit-han-lab / ncu-report-skill
View on GitHub
☆156May 24, 2026Updated last month
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,551Updated this week
BBuf / KDA-Pilot
View on GitHub
☆231Updated this week
MoonshotAI / FlashKDA
View on GitHub
FlashKDA: high-performance Kimi Delta Attention kernels
☆462May 26, 2026Updated last month
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆361Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
thunlp / TritonBench
View on GitHub
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
☆137Jun 14, 2025Updated last year
NVIDIA / tilus
View on GitHub
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆489Jul 5, 2026Updated 2 weeks ago
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,545Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,925Updated this week
open-lm-engine / coda-kernels
View on GitHub
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
☆230Updated this week
flashinfer-ai / flashinfer-bench
View on GitHub
Building the Virtuous Cycle for AI-driven LLM Systems
☆259May 1, 2026Updated 2 months ago
tile-ai / tilelang-puzzles
View on GitHub
Learning TileLang with 10 puzzles!
☆338May 28, 2026Updated last month