drkennetz / cuda_examples
Some CUDA example code with READMEs.
☆19Updated last month
Alternatives and similar repositories for cuda_examples:
Users that are interested in cuda_examples are comparing it to the libraries listed below
- NVIDIA tools guide☆93Updated last week
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆171Updated this week
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!☆74Updated this week
- ☆23Updated 2 years ago
- Serial and parallel implementations of matrix multiplication☆39Updated 3 years ago
- ☆31Updated this week
- Distributed View Extension for Kokkos☆43Updated last month
- LLM training in simple, raw C/CUDA☆91Updated 8 months ago
- Repository with examples and exercises for OLCF and AMD's HIP training series☆14Updated last year
- Examples for using SYCL on CUDA☆60Updated 2 weeks ago
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆17Updated 4 years ago
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆31Updated 4 months ago
- ☆82Updated last year
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆104Updated this week
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆73Updated last year
- Distributed ranges is a generalization of C++ ranges for distributed data structures.☆48Updated this week
- Main Book repository for the Parallel and High Performance Computing book, Manning Publications☆188Updated 2 years ago
- NVIDIA Performance Libraries: Sample code☆21Updated last week
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆22Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆47Updated last year
- Exercises and Solutions for "Programming Your GPU with OpenMP: A Hands-On Introduction"☆125Updated 2 months ago
- Little OpenMP Library☆158Updated 2 years ago
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆77Updated 5 months ago
- CUDA Kernel Benchmarking Library☆547Updated 2 months ago
- Copy-hiding array abstraction to automatically migrate data between memory spaces☆106Updated this week
- CUDA Matrix Multiplication Optimization☆153Updated 6 months ago
- CUDA Learning guide☆289Updated 7 months ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆58Updated 7 months ago
- Reusable software components for ROCm developers☆81Updated this week