komrad36 / CUDALERPLinks
Fast CUDA (GPU) Bilinear and Nearest-Neighbor Interpolation at high accuracy - uint8_t data
☆13Updated 4 years ago
Alternatives and similar repositories for CUDALERP
Users that are interested in CUDALERP are comparing it to the libraries listed below
Sorting:
- Fastest CUDA RGB to grayscale: 5-30x faster than OpenCV. For image processing/computer vision.☆15Updated 4 years ago
- Fastest CPU (AVX2) Bilinear and Nearest-Neighbor Interpolation: 25-100% faster than OpenCV. For computer vision / image processing.☆21Updated 4 years ago
- C++ convenience classes to be used with CUDA code, for both the host and the kerlel parts.☆55Updated 6 years ago
- Example of how to use CUDA with CMake >= 3.8☆70Updated 2 years ago
- Experimental ranges for CUDA☆24Updated 6 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆47Updated 6 months ago
- Fastest CPU (AVX/SSE) Horizontal Box Blur for image processing and computer vision☆10Updated 4 years ago
- A reference implementation of std::simd, providing data parallel types in the C++ standard☆12Updated 5 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- ☆68Updated 2 years ago
- A simple tool for porting CUDA to OpenCL (DEPRECATED)☆31Updated 6 years ago
- a CUDA implementation of a priority queue☆84Updated 4 years ago
- CMake module to optimize cflags for architecture extensions such as SSE, AVX☆27Updated 2 months ago
- Simple starter code for SYCL and Eigen☆18Updated 8 years ago
- STL-like containers (array, vector, matrix, cube) useable in device code.☆31Updated last year
- CUDA kernel author's tools☆111Updated 3 years ago
- Header file to translate SSE instructions to ARM NEON instructions☆48Updated 11 years ago
- immintrin_dbg.h is an include file, a wrapper around immintrin.h. It implements most of AVX, AVX2, AVX-512 vector intrinsics to enable so…☆56Updated 2 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Mirror JPEG compression and decompression accelerated on GPU☆81Updated 10 years ago
- SIMD implementation of 4x4 and 8x8 Fast DCT with OpenCV demo☆34Updated 8 years ago
- Concurrent CPU-GPU Programming using Task Models☆103Updated 5 years ago
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!☆99Updated last week
- Example of programmatic monitoring of Nvidia GPUs in C++ using NVML library☆31Updated 2 years ago
- Examples for using SYCL on CUDA☆62Updated 3 months ago
- A Halide journey taken for pleasure, this repo will hopefully serve a collection of Halide imaging functions that are useful to the commu…☆15Updated 9 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆52Updated 2 months ago
- OpenCL specific C++ libraries implemented in C++ for OpenCL kernel language published in releases of OpenCL-Docs☆119Updated 2 years ago
- A machine vision library written in SYCL and C++ that shows performance-portable implementation of graph algorithms☆161Updated last year