phrb / intro-cuda
Recursos e pdfs com uma introdução à programação em CUDA
☆23Updated 6 years ago
Alternatives and similar repositories for intro-cuda:
Users that are interested in intro-cuda are comparing it to the libraries listed below
- Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )☆60Updated this week
- ☆66Updated 11 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆89Updated last year
- Learn OpenCL step by step.☆133Updated 2 years ago
- AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/Ope…☆59Updated this week
- ☆42Updated 7 years ago
- MagmaDNN: a simple deep learning framework in c++☆49Updated 4 years ago
- OpenCL Tutorials☆50Updated 4 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Benchmarking OpenBLAS on the Apple M1☆18Updated 4 years ago
- Intel AVX-512简介☆44Updated last year
- ☆45Updated this week
- Sample code from the book "Professional CUDA C Programming"☆33Updated last year
- 性能分析工具在线书☆23Updated 5 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆55Updated 2 weeks ago
- pdf☆89Updated 6 years ago
- Next generation library for iterative sparse solvers for ROCm platform☆78Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last year
- Add OpenCL optimization for jpeg decode☆14Updated 8 years ago
- Heterogeneous Run Time version of TensorFlow. Added heterogeneous capabilities to the TensorFlow, uses heterogeneous computing infrastruc…☆36Updated 7 years ago
- Sparse-dense matrix-matrix multiplication on GPUs☆15Updated 6 years ago
- Collection of CUDA benchmarks, with a focus on unified vs. explicit memory management.☆20Updated 5 years ago
- 作为对《Heterogeneous Computing with OpenCL 2.0》英文版的中文翻译。☆132Updated 4 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆68Updated 5 years ago
- Tutorials to GPU programming. Reading notes.☆17Updated last year
- CUDA C++ syntax support & snippets for VSCode☆20Updated 3 years ago
- Common libraries for PPL projects☆29Updated 4 months ago
- A collection of awesome algorithms, implemented in CUDA.☆24Updated 7 years ago
- ☆11Updated 4 years ago
- ☆12Updated 5 years ago