flashinfer-ai/flashinfer-bench-starter-kit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/flashinfer-ai/flashinfer-bench-starter-kit)

flashinfer-ai / flashinfer-bench-starter-kit

FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels

☆178

Alternatives and similar repositories for flashinfer-bench-starter-kit

Users that are interested in flashinfer-bench-starter-kit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

flashinfer-ai / mlsys26-agent-baseline
View on GitHub
☆33Mar 12, 2026Updated 4 months ago
flashinfer-ai / flashinfer-bench
View on GitHub
Building the Virtuous Cycle for AI-driven LLM Systems
☆261May 1, 2026Updated 2 months ago
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 6 months ago
romitjain / kachua-mlsys
View on GitHub
[MLSys 26] 🥇 Solution for Gated Delta Net Track of MLSys 26 Flash infer competition
☆35May 22, 2026Updated 2 months ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
mayankagarwals / MLSys-FlashLinfer-Contest
View on GitHub
☆49Jul 14, 2026Updated 2 weeks ago
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆112Jul 3, 2026Updated 3 weeks ago
dsl-learn / cutile-learn
View on GitHub
NVIDIA cuTile learn
☆169Dec 9, 2025Updated 7 months ago
mlc-ai / pith-train
View on GitHub
Compact and Agent-Native MoE Training System
☆305Jul 23, 2026Updated last week
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆733Jul 4, 2026Updated 3 weeks ago
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,080Updated this week
GindaChen / nsys-ai
View on GitHub
Terminal UI for NVIDIA Nsight Systems profiles — timeline viewer, kernel navigator, NVTX hierarchy
☆66Updated this week
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆257Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Dogacel / auto-gpu-kernel
View on GitHub
Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average…
☆148Jun 10, 2026Updated last month
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated last year
technillogue / ptx-isa-markdown
View on GitHub
PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
☆220Dec 24, 2025Updated 7 months ago
HydraQYH / expert_specialization_moe
View on GitHub
Expert Specialization MoE Solution based on CUTLASS
☆27Apr 14, 2026Updated 3 months ago
inclusionAI / cuLA
View on GitHub
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
☆535Jul 23, 2026Updated last week
microsoft / tokenweave
View on GitHub
Accepted to MLSys 2026
☆91Apr 19, 2026Updated 3 months ago
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Jul 17, 2026Updated last week
xinhao-luo / ClusterFusion
View on GitHub
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆75Dec 11, 2025Updated 7 months ago
tile-ai / TileFoundry
View on GitHub
☆55Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
tile-ai / TileOPs
View on GitHub
High-performance LLM operator library built on TileLang.
☆165Updated this week
mit-han-lab / KernelWiki
View on GitHub
☆317Jun 9, 2026Updated last month
NVIDIA / SOL-ExecBench
View on GitHub
A benchmark of real-world DL kernel problems
☆265Jul 15, 2026Updated 2 weeks ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
KuangjuX / cuda-evolve-oss
View on GitHub
Autonomous GPU kernel optimization system driven by AI agents.
☆31Mar 29, 2026Updated 4 months ago
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,503Jul 20, 2026Updated last week
NVIDIA / TileGym
View on GitHub
Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming
☆785Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,600Jul 14, 2026Updated 2 weeks ago
KuangjuX / ncu-cli
View on GitHub
Automated CUDA kernel performance diagnostics from NVIDIA Nsight Compute (NCU) CSV exports.
☆34Mar 18, 2026Updated 4 months ago
mlc-ai / tirx-kernels
View on GitHub
ML kernels and benchmarking infrastructure written in TIRx
☆77Updated this week
HPMLL / NVIDIA-Hopper-Benchmark
View on GitHub
☆116May 31, 2025Updated last year
hao-ai-lab / flash-attention-fp4
View on GitHub
NVFP4 Flash-Attention 4 on BlackWell
☆31Jul 23, 2026Updated last week
HarryWu99 / funny_cute
View on GitHub
Some funny cute/cuteDSL code snippets
☆33Mar 2, 2026Updated 4 months ago
apache / tvm-ffi
View on GitHub
Open ABI and FFI for Machine Learning Systems
☆437Updated this week