dsl-learn/LeetGPU

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dsl-learn/LeetGPU)

dsl-learn / LeetGPU

LeetGPU Solutions

☆124

Alternatives and similar repositories for LeetGPU

Users that are interested in LeetGPU are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rishisankar / leetgpu
View on GitHub
Solutions to leetgpu CUDA challenges on https://leetgpu.com/
☆19May 25, 2025Updated last year
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
xlite-dev / ffpa-attn
View on GitHub
FFPA: Kernel Library for Large Headdim Attention - 1.5x~6x speedup over PyTorch SDPA.
☆319Updated this week
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆61Feb 6, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HarryWu99 / funny_cute
View on GitHub
Some funny cute/cuteDSL code snippets
☆33Mar 2, 2026Updated 4 months ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
ChengzhuUwU / LuisaComputeSimulator
View on GitHub
High-Performance Cross-Platform GPU-Based Physics Simulator, Based on LuisaCompute
☆32Jun 10, 2026Updated last month
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
AlphaGPU / leetgpu-challenges
View on GitHub
LeetGPU Challenges
☆1,024Updated this week
zhuzilin / flash-attention-with-sink
View on GitHub
☆37Aug 7, 2025Updated 11 months ago
ROCm / tritonBLAS
View on GitHub
A lightweight triton-based General Matrix Multiplication (GEMM) library.
☆66Jul 21, 2026Updated last week
shinezyy / deepseek_model
View on GitHub
☆42Oct 12, 2025Updated 9 months ago
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆37Updated this week
sonnyli / flash_attention_from_scratch
View on GitHub
Flash Attention from Scratch on CUDA Ampere
☆188Sep 1, 2025Updated 10 months ago
HanGuo97 / hilt
View on GitHub
☆40Dec 14, 2025Updated 7 months ago
MuGdxy / SymEigen
View on GitHub
A Single .py File Sympy Extension to Generate Eigen C++ Code from the Symbols.
☆12Dec 17, 2025Updated 7 months ago
zartbot / gfd
View on GitHub
GPU Functional Descriptor for memory access
☆34May 24, 2026Updated 2 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,503Jul 20, 2026Updated last week
zinccat / Awesome-Triton-Kernels
View on GitHub
Collection of kernels written in Triton language
☆200Jan 27, 2026Updated 6 months ago
JJXiangJiaoJun / cutlass_gemv
View on GitHub
GEMV implementation with CUTLASS
☆21Aug 21, 2025Updated 11 months ago
technillogue / ptx-isa-markdown
View on GitHub
PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
☆220Dec 24, 2025Updated 7 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ZiXuanVickyLu / culbvh
View on GitHub
lbvh implementation and benchmark following jerry's(https://github.com/jerry060599/KittenGpuLBVH) optimization
☆20Dec 4, 2025Updated 7 months ago
sukoncon / TMA-Adaptive-FP8-Grouped-GEMM
View on GitHub
☆27Aug 28, 2025Updated 11 months ago
gty111 / GEMM_MMA
View on GitHub
Optimize GEMM with tensorcore step by step
☆40Dec 17, 2023Updated 2 years ago
xxyux / SpInfer
View on GitHub
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆68Mar 25, 2025Updated last year
aikitoria / nanotrace
View on GitHub
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
☆136Jul 17, 2026Updated last week
togethercomputer / ParallelKernelBench
View on GitHub
☆45Updated this week
ColfaxResearch / cfx-article-src
View on GitHub
☆193May 7, 2025Updated last year
lzyrapx / LeetGPU
View on GitHub
🌈 Solutions of LeetGPU
☆95Jun 11, 2026Updated last month
xlite-dev / LeetCUDA
View on GitHub
LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.
☆11,662Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
leepoly / sm-profiler
View on GitHub
☆84Feb 5, 2026Updated 5 months ago
shizhengLi / cuda-triton-learning
View on GitHub
CUDA & Triton Learning Project: Flash Attention 实现探索
☆37Aug 14, 2025Updated 11 months ago
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆363Updated this week
yzlnew / infra-skills
View on GitHub
A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…
☆140Jul 9, 2026Updated 3 weeks ago
Tencent / hpc-ops
View on GitHub
High Performance LLM Inference Operator Library
☆1,070Updated this week
gpu-mode / triton-index
View on GitHub
Cataloging released Triton kernels.
☆310Sep 9, 2025Updated 10 months ago
spiriMirror / libuipc-doc
View on GitHub
Unified Incremental Potential Contact Framework Documentation
☆15Jul 8, 2026Updated 3 weeks ago