komrad36 / CUDALERPLinks
Fast CUDA (GPU) Bilinear and Nearest-Neighbor Interpolation at high accuracy - uint8_t data
☆13Updated 4 years ago
Alternatives and similar repositories for CUDALERP
Users that are interested in CUDALERP are comparing it to the libraries listed below
Sorting:
- C++ convenience classes to be used with CUDA code, for both the host and the kerlel parts.☆55Updated 6 years ago
- ☆68Updated 2 years ago
- Example of how to use CUDA with CMake >= 3.8☆70Updated 2 weeks ago
- Fastest CPU (AVX2) Bilinear and Nearest-Neighbor Interpolation: 25-100% faster than OpenCV. For computer vision / image processing.☆21Updated 4 years ago
- Experimental ranges for CUDA☆24Updated 6 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆16Updated 5 years ago
- Portable 128-bit SIMD intrinsics☆58Updated last year
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆71Updated 9 years ago
- This repository contains components that will support percolation via OpenCL and CUDA☆32Updated 3 years ago
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆99Updated 3 weeks ago
- A machine vision library written in SYCL and C++ that shows performance-portable implementation of graph algorithms☆161Updated last year
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆47Updated 7 months ago
- Concurrent CPU-GPU Programming using Task Models☆103Updated 5 years ago
- Kernel Tuning Toolkit☆60Updated last month
- GPU Optimization and Memory Abstraction Framework☆32Updated 5 years ago
- A collection of code examples for learning parallel programming concepts☆52Updated 4 years ago
- CMake module collection☆30Updated 10 years ago
- CMake module to optimize cflags for architecture extensions such as SSE, AVX☆27Updated 3 months ago
- Tests and benchmarks for cudnn (and in the future, other nvidia libraries)☆53Updated 4 years ago
- Samples from the AMD APP SDK (with OpenCRun support)☆16Updated 7 years ago
- A reference implementation of std::simd, providing data parallel types in the C++ standard☆12Updated 5 years ago
- CUDA kernel author's tools☆111Updated 3 years ago
- Deep Neural Network Architectures with dlib☆19Updated 5 months ago
- An alternative to Boost.MPI for a user friendly C++ interface for MPI (MPICH).☆19Updated 7 years ago
- C++ implementation of neural networks library with Keras-like API. Contains majority of commonly used layers, losses and optimizers. Supp…☆38Updated 4 years ago
- Asynchronous Task and Memory Interface, or ATMI, is a runtime framework and programming model for heterogeneous CPU-GPU systems. It provi…☆68Updated last year
- Algorithms implemented in CUDA + resources about GPGPU☆56Updated 3 years ago
- Bitonic Sort for C and CUDA☆16Updated 6 years ago