OrangeOwlSolutions / General-CUDA-programming
☆42Updated 7 years ago
Alternatives and similar repositories for General-CUDA-programming:
Users that are interested in General-CUDA-programming are comparing it to the libraries listed below
- Example of how to use CUDA with CMake >= 3.8☆69Updated last year
- Some CUDA design patterns and a bit of template magic for CUDA☆149Updated last year
- Example code used in the CVPR 2015 tutorial☆40Updated 9 years ago
- This example builds on the parallel-forall repo separate compilation example by adding CMake to it.☆17Updated 7 years ago
- Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )☆60Updated this week
- a c++/cuda template library for tensor lazy evaluation☆163Updated last year
- CS344 - Introduction To Parallel Programming course (Udacity) proposed solutions☆54Updated 7 years ago
- Collection of CUDA benchmarks, with a focus on unified vs. explicit memory management.☆20Updated 5 years ago
- CNNs in Halide☆23Updated 9 years ago
- Learn OpenCL step by step.☆134Updated 2 years ago
- Connected Component Labeling.☆43Updated 4 years ago
- MWE for using the Eigen library in CUDA kernels☆118Updated 2 years ago
- ☆21Updated 7 years ago
- GPU Implementation of some Connected Component Labeling Algorithms☆21Updated 4 years ago
- Simple example of implementing a new Tensorflow operation and its gradient in C++.☆56Updated 5 years ago
- PLEASE SEE THE OFFICIAL REPOSITORY. THIS IS NOT MAINTAINED ANYMORE.☆93Updated 5 years ago
- ☆66Updated 11 years ago
- Windows Visual Studio Solutions for class "Introduction to Parallel Programming"☆19Updated 6 years ago
- Set of basic classes (vector, matrix, images and memory array) for CPU and GPU☆17Updated 4 years ago
- some CUDA programming example☆25Updated 8 years ago
- Communication-Minimizing 2D Convolution in GPU Registers☆30Updated 11 years ago
- CMake Examples (CMake, CMake+CUDA, CMake+CUDA+PandaRoot)☆41Updated 11 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm☆34Updated 5 years ago
- An expression template based linear algebra library running completely on the GPU using CUDA☆25Updated 3 years ago
- ONNX Parser is a tool that automatically generates openvx inference code (CNN) from onnx binary model files.☆18Updated 6 years ago
- Utilities for CUDA programming☆40Updated 5 years ago
- FastHOG library that has been fixed to work with CUDA 5.x on Ubuntu 12.04☆20Updated 11 years ago
- Efficient CUDA Stream Compaction Library☆33Updated last year
- A CUDA implementation of the dilation and erosion filters showing several optimizations to speed up the processing.☆43Updated 9 years ago