CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples.
☆474Jun 30, 2023Updated 2 years ago
Alternatives and similar repositories for CUDA-by-Example-source-code-for-the-book-s-examples-
Users that are interested in CUDA-by-Example-source-code-for-the-book-s-examples- are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆490Jul 5, 2015Updated 10 years ago
- Learn CUDA Programming, published by Packt☆1,241Dec 30, 2023Updated 2 years ago
- GPU高性能编程CUDA实战随书代码☆47May 24, 2022Updated 3 years ago
- Samples for CUDA Developers which demonstrates features in CUDA Toolkit☆9,050Mar 30, 2026Updated last week
- 基于 CUDA Driver API 的 cuda 运行时环境☆16Jul 30, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- CUDA Library Samples☆2,372Mar 17, 2026Updated 3 weeks ago
- The CMake version of cuda_by_example☆148Jul 24, 2020Updated 5 years ago
- how to optimize some algorithm in cuda.☆2,910Apr 1, 2026Updated last week
- Sample codes for my CUDA programming book☆2,033Dec 14, 2025Updated 3 months ago
- An MLIR-based compiler from C/C++ to AMD-Xilinx Versal AIE☆17Aug 5, 2022Updated 3 years ago
- ☆2,724Jan 16, 2024Updated 2 years ago
- This repository contains the results and code for the MLPerf™ Training v2.1 benchmark.☆15Aug 9, 2023Updated 2 years ago
- Transformer related optimization, including BERT, GPT☆6,410Mar 27, 2024Updated 2 years ago
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,536Apr 2, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A simple high performance CUDA GEMM implementation.☆430Jan 4, 2024Updated 2 years ago
- This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…☆1,271Jul 29, 2023Updated 2 years ago
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆10,217Updated this week
- Source code examples from the Parallel Forall Blog☆1,325Sep 23, 2025Updated 6 months ago
- Material for gpu-mode lectures☆5,923Feb 1, 2026Updated 2 months ago
- A few cuda examples built with cmake☆24Jul 19, 2019Updated 6 years ago
- TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。☆15Jun 1, 2023Updated 2 years ago
- CUDA SGEMM optimization note☆15Oct 31, 2023Updated 2 years ago
- Hands-On GPU Accelerated Computer Vision with OpenCV and CUDA, published by Packt☆659Jan 30, 2023Updated 3 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Step-by-step optimization of CUDA SGEMM☆453Mar 30, 2022Updated 4 years ago
- ☆19May 17, 2016Updated 9 years ago
- Several simple examples for popular neural network toolkits calling custom CUDA operators.☆1,529Apr 29, 2021Updated 4 years ago
- ☆16Apr 28, 2023Updated 2 years ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆470Mar 10, 2025Updated last year
- flash attention tutorial written in python, triton, cuda, cutlass☆502Jan 20, 2026Updated 2 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated last year
- Examples from Programming in Parallel with CUDA☆170Feb 5, 2026Updated 2 months ago
- This is a Chinese translation of the CUDA programming guide☆1,928Nov 13, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Introduction to Parallel Programming class code☆1,348Jun 27, 2022Updated 3 years ago
- 使用ONNXRuntime部署一种用于边缘检测的轻量级密集卷积神经网络LDC,包含C++和Python两个版本的程序☆11Apr 24, 2023Updated 2 years ago
- [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl☆5,000Feb 8, 2024Updated 2 years ago
- Development repository for the Triton language and compiler☆18,840Apr 4, 2026Updated last week
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,827Oct 9, 2023Updated 2 years ago
- CUDA Core Compute Libraries☆2,260Updated this week
- GPU programming related news and material links☆2,084Mar 8, 2026Updated last month