RichardAns / CUDA-ProgramsLinks

Examples from Programming in Parallel with CUDA

☆158

Alternatives and similar repositories for CUDA-Programs

Users that are interested in CUDA-Programs are comparing it to the libraries listed below

Sorting:

CodedK / CUDA-by-Example-source-code-for-the-book-s-examples-
CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. …
☆433Updated 2 years ago
deeperlearning / professional-cuda-c-programming
☆452Updated 10 years ago
essentialsofparallelcomputing / EssentialsOfParallelComputing
Main Book repository for the Parallel and High Performance Computing book, Manning Publications
☆210Updated 3 years ago
FZJ-JSC / tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
☆287Updated last month
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆362Updated 3 years ago
olcf / cuda-training-series
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
☆831Updated 11 months ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆213Updated last year
CUDA-Tutorial / CodeSamples
Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"
☆91Updated last year
R100001 / Programming-Massively-Parallel-Processors
☆173Updated last year
CisMine / Guide-NVIDIA-Tools
NVIDIA tools guide
☆143Updated 6 months ago
siboehm / SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
☆786Updated last year
puttsk / cuda-tutorial
A set of hands-on tutorials for CUDA programming
☆230Updated last year
NVIDIA / multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆765Updated 5 months ago
Cjkkkk / CUDA_gemm
A simple high performance CUDA GEMM implementation.
☆392Updated last year
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆370Updated 7 months ago
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆72Updated 4 years ago
NVIDIA / nsight-training
Training material for Nsight developer tools
☆163Updated 11 months ago
cwpearson / nvidia-performance-tools
Instructions, Docker images, and examples for Nsight Compute and Nsight Systems
☆131Updated 5 years ago
rox906 / tcFFT
☆41Updated 4 years ago
NVIDIA / cuCollections
☆561Updated this week
KernelTuner / kernel_tuner
Kernel Tuner
☆356Updated last week
ArchaeaSoftware / cudahandbook
Source code that accompanies The CUDA Handbook.
☆532Updated 6 months ago
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆138Updated 4 years ago
XiaoSong9905 / CUDA-Optimization-Guide
Xiao's CUDA Optimization Guide [NO LONGER ADDING NEW CONTENT]
☆309Updated 2 years ago
Apress / data-parallel-CPP
Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…
☆275Updated 4 months ago
RRZE-HPC / gpu-benches
collection of benchmarks to measure basic GPU capabilities
☆401Updated 5 months ago
leimao / CUTLASS-Examples
CUTLASS and CuTe Examples
☆65Updated 3 weeks ago
XiaoSong9905 / HPC-Notes
Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]
☆69Updated 3 years ago
PacktPublishing / Learn-CUDA-Programming
Learn CUDA Programming, published by Packt
☆1,173Updated last year
NVIDIA / nvbench
CUDA Kernel Benchmarking Library
☆692Updated this week