deepreinforce-ai/CUDA-L2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/deepreinforce-ai/CUDA-L2)

deepreinforce-ai / CUDA-L2

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

☆445

Alternatives and similar repositories for CUDA-L2

Users that are interested in CUDA-L2 are comparing it to the libraries listed below

Sorting:

flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated 10 months ago
deepreinforce-ai / CRINN
View on GitHub
CRINN - Free & Fast Framework for Approximate Nearest Neighbors Search Via Contrastive Reinforcement Learning
☆77Aug 5, 2025Updated 7 months ago
caoshiyi / K-Search
View on GitHub
Automated GPU Kernel Generation via Co-Evolving Intrinsic World Model
☆52Updated this week
Oyami-Srk / RISCV-GDB-Paging
View on GitHub
Paging Debug tool for GDB using python
☆13Jun 4, 2022Updated 3 years ago
ademeure / cuda-side-boost
View on GitHub
☆53Feb 24, 2026Updated last week
thunlp / TritonBench
View on GitHub
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
☆118Jun 14, 2025Updated 8 months ago
ruipeterpan / specreason
View on GitHub
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆65Oct 2, 2025Updated 5 months ago
moonquest-ai / SRDA
View on GitHub
☆30Jun 7, 2025Updated 9 months ago
NVIDIA / cuda-tile
View on GitHub
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…
☆860Feb 24, 2026Updated last week
samim23 / vortexnet
View on GitHub
VortexNet: Neural Computing through Fluid Dynamics
☆49Jan 19, 2025Updated last year
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆223Jan 20, 2026Updated last month
tanzelin430 / libsmctrl
View on GitHub
libsmctrl论文的复现，添加了python端接口，可以在python端灵活调用接口来分配计算资源
☆12May 21, 2024Updated last year
huggingface / hf-rocm-kernels
View on GitHub
☆23Jul 11, 2025Updated 7 months ago
uwsampl / paper-agents
View on GitHub
☆13Dec 9, 2024Updated last year
dorsal-lab / hip-analyzer
View on GitHub
Compiler plugin for performance analysis of HIP applications
☆13Apr 7, 2025Updated 11 months ago
redplait / denvdis
View on GitHub
NVidia sass disassembler/inline patcher
☆44Updated this week
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆597Feb 27, 2026Updated last week
ScalingIntelligence / KernelBench
View on GitHub
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
☆836Updated this week
ademeure / QuickRunCUDA
View on GitHub
☆16Feb 24, 2026Updated last week
luongthecong123 / fp8-quant-matmul
View on GitHub
Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.
☆17Feb 9, 2026Updated 3 weeks ago
AndreSlavescu / mHC.cu
View on GitHub
mHC kernels implemented in CUDA
☆254Updated this week
xxyux / SpInfer
View on GitHub
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆60Mar 25, 2025Updated 11 months ago
thomaschlt / mla.c
View on GitHub
Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.
☆18Jan 15, 2025Updated last year
hrw / edk2-armcpuinfo
View on GitHub
Arm CPU information command for UEFI Shell
☆15Aug 20, 2025Updated 6 months ago
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,202Feb 24, 2026Updated last week
HazyResearch / HipKittens
View on GitHub
Fast and Furious AMD Kernels
☆372Feb 26, 2026Updated last week
xuy / noah
View on GitHub
Noah -- fixing your computer issues
☆47Updated this week
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆18Nov 19, 2024Updated last year
GustavoStahl / CASS
View on GitHub
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark
☆34Jun 24, 2025Updated 8 months ago
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆567Nov 7, 2025Updated 4 months ago
NVIDIA / tilus
View on GitHub
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆446Updated this week
microsoft / AttentionEngine
View on GitHub
☆118May 19, 2025Updated 9 months ago
flashinfer-ai / debug-print
View on GitHub
Debug print operator for cudagraph debugging
☆14Aug 2, 2024Updated last year
KevlarKanou / rwkv7.c
View on GitHub
Inference RWKV v7 in pure C.
☆44Oct 10, 2025Updated 4 months ago
mirage-project / mirage
View on GitHub
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆2,148Feb 23, 2026Updated last week
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆74May 5, 2025Updated 10 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,371Feb 13, 2026Updated 3 weeks ago
NVIDIA / nsight-python
View on GitHub
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
☆167Updated this week
saivittalb / branch-prediction-programming
View on GitHub
🎞 Implementation of several Branch Prediction algorithms and analysis on their effectiveness on real-world program traces.
☆21Apr 10, 2021Updated 4 years ago