zhangtianhong-1998 / Cuda_learnLinks

这是一个从零学习CUDA课程

☆13

Alternatives and similar repositories for Cuda_learn

Users that are interested in Cuda_learn are comparing it to the libraries listed below

Sorting:

ChambinLee / CUDA_with_PyTorch
这个项目介绍了简单的CUDA入门，涉及到CUDA执行模型、线程层次、CUDA内存模型、核函数的编写方式以及PyTorch使用CUDA扩展的两种方式。通过该项目可以基本入门基于PyTorch的CUDA扩展的开发方式。
☆94Updated 4 years ago
LSTM-Kirigaya / OptimizeNote
为了我的知乎上的最优化笔记做的manim动画源码
☆41Updated 4 years ago
caibucai22 / awesome-cuda
Awesome code, projects, books, etc. related to CUDA
☆26Updated 3 months ago
mrzhuzhe / riven
CPU Memory Compiler and Parallel programing
☆26Updated last year
HuangCongQing / cuda-learning
cuda编程学习入门
☆37Updated last year
ZZongzheng0918 / Morpheus-Software
RSS 2025: Morpheus-Software
☆21Updated 3 months ago
KJaebye / EmbodiedAI-Robotics-arXiv-Daily-Reporter
💩里淘金
☆28Updated last week
Kedreamix / pytorch-cppcuda-tutorial
tutorial for writing custom pytorch cpp+cuda kernel, applied on volume rendering (NeRF)
☆29Updated last year
Sherlock1956 / FlashAttentionTritonLab
☆15Updated 3 months ago
piDack / The-ans-for-Programming-Massively-Parallel-Processor
大规模并行处理器编程实战第二版答案
☆33Updated 3 years ago
YuxueYang1204 / CudaDemo
Implement custom operators in PyTorch with cuda/c++
☆73Updated 2 years ago
unw9527 / ECE408
ECE408 (Applied Parallel Programming) Fall 2022 MP
☆16Updated 2 years ago
Emericen / tiny-qwen
A minimal, easy-to-read PyTorch reimplementation of the Qwen3 and Qwen2.5 VL with a fancy CLI
☆191Updated last week
shreyansh26 / FlashAttention-PyTorch
Implementation of FlashAttention in PyTorch
☆173Updated 10 months ago
sejmoonwei / SPGrasp
Official implementation of SPGrasp: A framework for dynamic grasp synthesis from sparse spatiotemporal prompts.
☆14Updated 3 months ago
openpsi-project / DeepLearning2025
☆42Updated 6 months ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆163Updated 2 months ago
OpenHelix-Team / CEED-VLA
Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.
☆43Updated 2 months ago
PFCCLab / Camp
飞桨护航计划集训营
☆20Updated 3 weeks ago
GigaAI-research / EmbodieDreamer
☆31Updated 4 months ago
attention-survey / Efficient_Attention_Survey
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
☆226Updated 2 months ago
DataXujing / TensorRT-LLM-ChatGLM3
大模型部署实战：TensorRT-LLM, Triton Inference Server, vLLM
☆26Updated last year
clearhanhui / LearnLibTorch
LibTorch 中文教程。
☆141Updated last year
doorteeth / learn_cuda
☆43Updated 3 years ago
zjhellofss / triton_course
☆39Updated 6 months ago
InternLandMark / LandMarkSystem
The first open-source system for large-scale scene reconstruction training and rendering.
☆55Updated last year
zsc2003 / ShanghaiTech-CS101
ShanghaiTech CS101 Algorithm and Data Structures, Fall 2022, Fall 2024.
☆10Updated last month
li199603 / parallel_prefix_sum
Parallel Prefix Sum (Scan) with CUDA
☆27Updated last year
openmlsys / openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
☆134Updated 2 years ago
Bruce-Lee-LY / decoding_attention
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
☆45Updated 5 months ago