RussWong/CUDATutorial

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RussWong/CUDATutorial)

RussWong / CUDATutorial

A CUDA tutorial to make people learn CUDA program from 0

☆279

Alternatives and similar repositories for CUDATutorial

Users that are interested in CUDATutorial are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RussWong / LLM-engineering
View on GitHub
☆28Aug 9, 2025Updated 9 months ago
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆2,998Updated this week
zjhellofss / triton_course
View on GitHub
☆50Mar 4, 2026Updated 2 months ago
Liu-xiandong / How_to_optimize_in_GPU
View on GitHub
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,300Jul 29, 2023Updated 2 years ago
zjhellofss / KuiperLLama
View on GitHub
校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
☆539Oct 28, 2025Updated 6 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xlite-dev / LeetCUDA
View on GitHub
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
☆10,965May 3, 2026Updated 2 weeks ago
Cjkkkk / CUDA_gemm
View on GitHub
A simple high performance CUDA GEMM implementation.
☆434Jan 4, 2024Updated 2 years ago
zjhellofss / KuiperInfer
View on GitHub
校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library st…
☆3,427Jun 22, 2025Updated 10 months ago
Eddie-Wang1120 / HPC-Learning-Notes
View on GitHub
高性能计算相关知识学习笔记，包含学习笔记和相关知识的代码demo，在持续完善中。如果有帮助的话请Star一下，对作者帮助很大，谢谢！
☆475Mar 28, 2023Updated 3 years ago
YangLinzhuo / cuda-sgemm-optimization
View on GitHub
CUDA SGEMM optimization note
☆15Oct 31, 2023Updated 2 years ago
luliyucoordinate / flash-attention-minimal
View on GitHub
Flash Attention in ~100 lines of CUDA (forward pass only)
☆12Jun 10, 2024Updated last year
harleyszhang / lite_llama
View on GitHub
A light llama-like llm inference framework based on the triton kernel.
☆186Jan 5, 2026Updated 4 months ago
PaddleJitLab / CUDATutorial
View on GitHub
A self-learning tutorail for CUDA High Performance Programing.
☆989Jan 14, 2026Updated 4 months ago
gpu-mode / lectures
View on GitHub
Material for gpu-mode lectures
☆6,080May 9, 2026Updated last week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
wangzyon / trt_learn
View on GitHub
TensorRT encapsulation, learn, rewrite, practice.
☆29Oct 19, 2022Updated 3 years ago
AyakaGEMM / Hands-on-GEMM
View on GitHub
☆153Mar 18, 2024Updated 2 years ago
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆544Sep 8, 2024Updated last year
tspeterkim / flash-attention-minimal
View on GitHub
Flash Attention in ~100 lines of CUDA (forward pass only)
☆1,140Dec 30, 2024Updated last year
xlite-dev / Awesome-LLM-Inference
View on GitHub
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
☆5,229Apr 20, 2026Updated last month
nndeploy / nndeploy
View on GitHub
一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework
☆1,818Apr 25, 2026Updated 3 weeks ago
sunkx109 / My-Torch-Extension
View on GitHub
A minimalist and extensible PyTorch extension for implementing custom backend operators in PyTorch.
☆40Jan 24, 2026Updated 3 months ago
BBuf / tvm_mlir_learn
View on GitHub
compiler learning resources collect.
☆2,736Updated this week
sunkx109 / llama.cpp
View on GitHub
llama 2 Inference
☆43Nov 4, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
tpoisonooo / how-to-optimize-gemm
View on GitHub
row-major matmul optimization
☆725May 14, 2026Updated last week
OpenPPL / ppl.llm.kernel.cuda
View on GitHub
☆150Jan 9, 2025Updated last year
openmlsys / openmlsys-cuda
View on GitHub
Tutorials for writing high-performance GPU operators in AI frameworks.
☆135Aug 12, 2023Updated 2 years ago
LitLeo / TensorRT_Tutorial
View on GitHub
☆1,052Mar 13, 2024Updated 2 years ago
BBuf / how-to-learn-deep-learning-framework
View on GitHub
how to learn PyTorch and OneFlow
☆496Updated this week
deep-practice / FastBEV-TensorRT
View on GitHub
☆14Apr 18, 2023Updated 3 years ago
TRT2022 / ControlNet_TensorRT
View on GitHub
天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛初赛第三名方案
☆50Aug 16, 2023Updated 2 years ago
sesmfs / onnx_quant_tool
View on GitHub
An onnx-based quantitation tool.
☆71Jan 8, 2024Updated 2 years ago
66RING / tiny-flash-attention
View on GitHub
flash attention tutorial written in python, triton, cuda, cutlass
☆516Jan 20, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ifromeast / cuda_learning
View on GitHub
learning how CUDA works
☆388Mar 3, 2025Updated last year
luliyucoordinate / cute-flash-attention
View on GitHub
Implement Flash Attention using Cute.
☆107Dec 17, 2024Updated last year
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆434Mar 5, 2026Updated 2 months ago
caiwanxianhust / FasterLLaMA
View on GitHub
使用 CUDA C++ 实现的 llama 模型推理框架
☆65Nov 8, 2024Updated last year
Tony-Tan / CUDA_Freshman
View on GitHub
☆2,742Jan 16, 2024Updated 2 years ago
HeKun-NVIDIA / CUDA-Programming-Guide-in-Chinese
View on GitHub
This is a Chinese translation of the CUDA programming guide
☆1,967Nov 13, 2024Updated last year
MARD1NO / CUDA-PPT
View on GitHub
☆135Apr 16, 2026Updated last month