slowlyC/agent-gpu-skills

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/slowlyC/agent-gpu-skills)

slowlyC / agent-gpu-skills

☆149

Alternatives and similar repositories for agent-gpu-skills

Users that are interested in agent-gpu-skills are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

maxiaosong1124 / ncu-cuda-profiling-skill
View on GitHub
let coding agents use ncu skills analysis cuda program automatically!
☆117May 25, 2026Updated last month
technillogue / ptx-isa-markdown
View on GitHub
PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
☆215Dec 24, 2025Updated 6 months ago
TongmingLAIC / AKO4ALL
View on GitHub
Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language
☆323May 31, 2026Updated last month
KernelFlow-ops / cuda-optimized-skill
View on GitHub
A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It …
☆191Apr 22, 2026Updated 2 months ago
mit-han-lab / KernelWiki
View on GitHub
☆310Jun 9, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
BBuf / KDA-Pilot
View on GitHub
☆231Updated this week
sablin39 / tilelang-cuda-skills
View on GitHub
Skills for writing tilelang and debugging with CUDA toolkits.
☆131May 20, 2026Updated 2 months ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
Tencent / hpc-ops
View on GitHub
High Performance LLM Inference Operator Library
☆1,041Updated this week
mit-han-lab / kernel-design-agents
View on GitHub
☆754Jun 2, 2026Updated last month
BBuf / AI-Infra-Auto-Driven-SKILLS
View on GitHub
☆691Jul 14, 2026Updated last week
inclusionAI / cuLA
View on GitHub
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
☆534Updated this week
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆107Jul 3, 2026Updated 2 weeks ago
ForceInjection / cuda-code-skill
View on GitHub
将 NVIDIA PTX ISA 9.1、CUDA 13.1 (Runtime/Driver)、Math API 13.x、cuBLAS 13.2 及 NCCL 官方文档转换为易于检索的 Markdown 格式，并提供配套的 AI IDE 技能库（支持 Claude Cod…
☆20Jul 14, 2026Updated last week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
SemiAnalysisAI / microbench-blackwell
View on GitHub
☆121May 10, 2026Updated 2 months ago
mit-han-lab / ncu-report-skill
View on GitHub
☆156May 24, 2026Updated last month
ZJLi2013 / awesome-kernel-skills
View on GitHub
☆85Mar 31, 2026Updated 3 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,494Updated this week
gty111 / GEMM_MMA
View on GitHub
Optimize GEMM with tensorcore step by step
☆40Dec 17, 2023Updated 2 years ago
gxinlong / cuda-optimization-skill
View on GitHub
A skill for automatically optimizing CUDA code.
☆42Mar 26, 2026Updated 3 months ago
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,573Updated this week
alibaba / redfuser
View on GitHub
☆21Mar 17, 2026Updated 4 months ago
OptimAI-Lab / CudaForge
View on GitHub
Official Repo of CudaForge
☆84Dec 2, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lucifer1004 / VeloQ
View on GitHub
Agent-friendly GPU profile-query CLI
☆104Jun 22, 2026Updated 3 weeks ago
deciding / cutez
View on GitHub
CuTeDSL tutorials, tools, autotuner, profiler, etc.
☆40Jun 27, 2026Updated 3 weeks ago
aikitoria / nanotrace
View on GitHub
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
☆137Updated this week
KuangjuX / ncu-cli
View on GitHub
Automated CUDA kernel performance diagnostics from NVIDIA Nsight Compute (NCU) CSV exports.
☆34Mar 18, 2026Updated 4 months ago
MoonshotAI / FlashKDA
View on GitHub
FlashKDA: high-performance Kimi Delta Attention kernels
☆462May 26, 2026Updated last month
HarryWu99 / funny_cute
View on GitHub
Some funny cute/cuteDSL code snippets
☆33Mar 2, 2026Updated 4 months ago
dsl-learn / cuda-magic
View on GitHub
fake CUTLASS to get peformance
☆26Apr 28, 2026Updated 2 months ago
HanGuo97 / hilt
View on GitHub
☆40Dec 14, 2025Updated 7 months ago
yhwang-hub / OrinMLLM
View on GitHub
This project is primarily used to deploy large language models and multimodal large models on Orin.🚀🚀🚀
☆18Jun 23, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ademeure / cuda-side-boost
View on GitHub
☆60Feb 24, 2026Updated 4 months ago
xlite-dev / ffpa-attn
View on GitHub
🤖FFPA: Extends FA-2/3 via Split-D for large headdims, 1.5x~6×↑🎉 vs SDPA, up to 513~535 TFLOPS🎉 on NVIDIA H200.
☆315Updated this week
tile-ai / tilescale
View on GitHub
Tile-based language built for AI computation across all scales
☆173Jun 16, 2026Updated last month
reed-lau / cute-gemm
View on GitHub
☆186May 11, 2026Updated 2 months ago
mlc-ai / tirx-kernels
View on GitHub
ML kernels and benchmarking infrastructure written in TIRx
☆66Updated this week
NTT123 / cute-viz
View on GitHub
Cute layout visualization
☆43Jan 18, 2026Updated 6 months ago
TongmingLAIC / AKO4X
View on GitHub
Agentic Kernel Optimization — advanced & eXtensible: a closed-loop, campaign-based multi-agent system for optimizing GPU kernels (benchma…
☆61May 31, 2026Updated last month