AlphaGPU/leetgpu-challenges

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AlphaGPU/leetgpu-challenges)

AlphaGPU / leetgpu-challenges

LeetGPU Challenges

☆1,004

Alternatives and similar repositories for leetgpu-challenges

Users that are interested in leetgpu-challenges are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dsl-learn / LeetGPU
View on GitHub
LeetGPU Solutions
☆123Oct 9, 2025Updated 9 months ago
xlite-dev / LeetCUDA
View on GitHub
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆11,578Updated this week
GeeeekExplorer / nano-vllm
View on GitHub
Nano vLLM
☆14,557Apr 26, 2026Updated 2 months ago
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,988Updated this week
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,674Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
sgl-project / mini-sglang
View on GitHub
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
☆4,607May 17, 2026Updated 2 months ago
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆107Jul 3, 2026Updated 2 weeks ago
lzyrapx / LeetGPU
View on GitHub
🌈 Solutions of LeetGPU
☆94Jun 11, 2026Updated last month
tile-ai / tilelang-puzzles
View on GitHub
Learning TileLang with 10 puzzles!
☆338May 28, 2026Updated last month
gpu-mode / lectures
View on GitHub
Material for gpu-mode lectures
☆6,330Jun 15, 2026Updated last month
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,494Updated this week
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
NVIDIA / cutile-python
View on GitHub
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
☆2,121Updated this week
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆361Updated this week
mlc-ai / modern-gpu-programming-for-mlsys
View on GitHub
A tutorial on modern GPU programming for machine learning systems
☆1,017Updated this week
zhaochenyang20 / Awesome-ML-SYS-Tutorial
View on GitHub
My learning notes for ML SYS.
☆6,753Updated this week
NVIDIA / SOL-ExecBench
View on GitHub
A benchmark of real-world DL kernel problems
☆257Updated this week
NVIDIA / cuda-tile
View on GitHub
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…
☆999Jul 6, 2026Updated 2 weeks ago
siboehm / SGEMM_CUDA
View on GitHub
Fast CUDA matrix multiplication from scratch
☆1,256Sep 2, 2025Updated 10 months ago
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆61Feb 6, 2026Updated 5 months ago
SiriusNEO / Triton-Puzzles-Lite
View on GitHub
Puzzles for learning Triton, play it with minimal environment configuration!
☆735Mar 17, 2026Updated 4 months ago
gpu-mode / reference-kernels
View on GitHub
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆290Jul 14, 2026Updated last week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
mayankagarwals / MLSys-FlashLinfer-Contest
View on GitHub
☆48Jul 14, 2026Updated last week
Wenyueh / MinivLLM
View on GitHub
Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation
☆923Mar 16, 2026Updated 4 months ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
inclusionAI / cuLA
View on GitHub
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
☆534Updated this week
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,573Updated this week
NVIDIA / TileGym
View on GitHub
Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming
☆776Updated this week
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,545Updated this week
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
ScalingIntelligence / KernelBench
View on GitHub
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
☆1,148Mar 24, 2026Updated 3 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
deepseek-ai / TileKernels
View on GitHub
A kernel library written in tilelang
☆1,643Apr 23, 2026Updated 2 months ago
tile-ai / TileOPs
View on GitHub
High-performance LLM operator library built on TileLang.
☆161Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,925Updated this week
shinezyy / deepseek_model
View on GitHub
☆41Oct 12, 2025Updated 9 months ago
pranjalssh / fast.cu
View on GitHub
Fastest kernels written from scratch
☆583Sep 18, 2025Updated 10 months ago
Tongkaio / CUDA_Kernel_Samples
View on GitHub
CUDA 算子手撕与面试指南
☆1,045Aug 23, 2025Updated 10 months ago
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆86,727Updated this week