debowin / cuda-tiled-2D-convolutionLinks
Optimized Parallel Tiled Approach to perform 2D Convolution by taking advantage of the lower latency, higher bandwidth shared memory as well as global constant memory cached aggresively within GPU thread blocks.
☆14Updated 7 years ago
Alternatives and similar repositories for cuda-tiled-2D-convolution
Users that are interested in cuda-tiled-2D-convolution are comparing it to the libraries listed below
Sorting:
- CUDA based GPU Programming☆34Updated last year
- Study parallel programming - CUDA, OpenMP, MPI, Pthread☆58Updated 3 years ago
- ☆40Updated 4 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆91Updated last year
- A set of hands-on tutorials for CUDA programming☆230Updated last year
- Examples for using SYCL on CUDA☆62Updated 2 weeks ago
- Introduction to CUDA programming☆123Updated 8 years ago
- Examples from Programming in Parallel with CUDA☆157Updated 2 years ago
- Serial and parallel implementations of matrix multiplication☆42Updated 4 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆138Updated 4 years ago
- BLAS implementation for Intel FPGA☆77Updated 4 years ago
- Matrix Multiplication on GPU using Shared Memory considering Coalescing and Bank Conflicts☆25Updated 2 years ago
- A tool to deploy Deep Neural Networks on PULP-based SoC's☆82Updated 5 months ago
- Neural Network Acceleration using CPU/GPU, ASIC, FPGA☆61Updated 4 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆77Updated 3 months ago
- Inline PTX Assembly in CUDA example☆12Updated 3 years ago
- ☆29Updated 5 years ago
- CUDA Matrix Multiplication Optimization☆202Updated 11 months ago
- Next generation LAPACK implementation for ROCm platform☆106Updated this week
- ☆37Updated this week
- An extension library of WMMA API (Tensor Core API)☆99Updated last year
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆21Updated last year
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆280Updated last month
- Algorithms implemented in CUDA + resources about GPGPU☆56Updated 3 years ago
- MagmaDNN: a simple deep learning framework in c++☆50Updated 4 years ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆71Updated 4 years ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆236Updated 10 months ago
- Fast Relaxed Vector Fitting implementation for Python. vectfit3.py module for python projects☆15Updated 6 months ago
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆23Updated 2 weeks ago
- CUDA Guide☆70Updated last year