NVIDIA-developer-blog/code-samples

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA-developer-blog/code-samples)

NVIDIA-developer-blog / code-samples

Source code examples from the Parallel Forall Blog

☆1,332

Alternatives and similar repositories for code-samples

Users that are interested in code-samples are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / cub
View on GitHub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,840Oct 9, 2023Updated 2 years ago
NVIDIA-developer-blog / cudacasts
View on GitHub
Source code from NVIDIA CUDACasts
☆48May 1, 2014Updated 12 years ago
NVIDIA / cuda-samples
View on GitHub
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
☆9,406May 27, 2026Updated last month
NVIDIA / thrust
View on GitHub
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
☆5,004Feb 8, 2024Updated 2 years ago
NVIDIA / multi-gpu-programming-models
View on GitHub
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆908Sep 26, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
cudpp / cudpp
View on GitHub
CUDA Data Parallel Primitives Library
☆438Nov 9, 2018Updated 7 years ago
NVIDIA / nccl
View on GitHub
Optimized primitives for collective multi-GPU communication
☆4,893Updated this week
moderngpu / moderngpu
View on GitHub
Patterns and behaviors for GPU computing
☆1,782Jan 17, 2026Updated 6 months ago
NVIDIA / CUDALibrarySamples
View on GitHub
CUDA Library Samples
☆2,463Updated this week
NervanaSystems / maxas
View on GitHub
Assembler for NVIDIA Maxwell architecture
☆1,073Jan 3, 2023Updated 3 years ago
NVIDIA / cnmem
View on GitHub
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
☆298Nov 28, 2018Updated 7 years ago
harrism / hemi
View on GitHub
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
☆349Apr 14, 2022Updated 4 years ago
NVIDIA / gdrcopy
View on GitHub
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
☆1,399Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
wzsh / wmma_tensorcore_sample
View on GitHub
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆147Aug 18, 2020Updated 5 years ago
ArchaeaSoftware / cudahandbook
View on GitHub
Source code that accompanies The CUDA Handbook.
☆595Updated this week
rmfarber / ParallelProgrammingWithOpenACC
View on GitHub
Example codes from the book Parallel Programming With OpenACC
☆87Feb 15, 2017Updated 9 years ago
NVIDIA / jitify
View on GitHub
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
☆573Sep 15, 2025Updated 10 months ago
pytorch / FBGEMM
View on GitHub
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,570Updated this week
gunrock / gunrock
View on GitHub
Programmable CUDA/C++ GPU Graph Analytics
☆1,096Feb 28, 2026Updated 4 months ago
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆901Updated this week
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
View on GitHub
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆420Jan 2, 2025Updated last year
NVIDIA / TensorRT
View on GitHub
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…
☆13,167Jul 7, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆556Sep 8, 2024Updated last year
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 3 years ago
NVIDIA / cuCollections
View on GitHub
☆655Updated this week
Mellanox / nv_peer_memory
View on GitHub
☆399Apr 23, 2024Updated 2 years ago
NVIDIA / libcudacxx
View on GitHub
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
☆2,304Feb 7, 2024Updated 2 years ago
apache / tvm
View on GitHub
Open Machine Learning Compiler Framework
☆13,595Updated this week
RRZE-HPC / gpu-benches
View on GitHub
collection of benchmarks to measure basic GPU capabilities
☆530Oct 24, 2025Updated 8 months ago
gpgpu-sim / gpgpu-sim_distribution
View on GitHub
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for…
☆1,674Feb 15, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
NVIDIA / FasterTransformer
View on GitHub
Transformer related optimization, including BERT, GPT
☆6,442Mar 27, 2024Updated 2 years ago
pytorch / gloo
View on GitHub
Collective communications library with various primitives for multi-machine training.
☆1,438Jul 1, 2026Updated 2 weeks ago
Cjkkkk / CUDA_gemm
View on GitHub
A simple high performance CUDA GEMM implementation.
☆437Jan 4, 2024Updated 2 years ago
google / gemmlowp
View on GitHub
Low-precision matrix multiplication
☆1,844Jan 29, 2024Updated 2 years ago
flame / how-to-optimize-gemm
View on GitHub
☆2,020Jul 29, 2023Updated 2 years ago
bryancatanzaro / trove
View on GitHub
Full-speed Array of Structures access
☆177Apr 25, 2023Updated 3 years ago
NVIDIA / cccl
View on GitHub
CUDA Core Compute Libraries
☆2,435Updated this week