komrad36 / CUDALERPLinks

Fast CUDA (GPU) Bilinear and Nearest-Neighbor Interpolation at high accuracy - uint8_t data

☆13

Alternatives and similar repositories for CUDALERP

Users that are interested in CUDALERP are comparing it to the libraries listed below

Sorting:

mgopshtein / cudacpp
C++ convenience classes to be used with CUDA code, for both the host and the kerlel parts.
☆55Updated 6 years ago
intel / clGPU
☆68Updated 2 years ago
lukeyeager / cmake-cuda-example
Example of how to use CUDA with CMake >= 3.8
☆70Updated 2 weeks ago
komrad36 / KLERP
Fastest CPU (AVX2) Bilinear and Nearest-Neighbor Interpolation: 25-100% faster than OpenCV. For computer vision / image processing.
☆21Updated 4 years ago
pdziepak / ranges-gpu
Experimental ranges for CUDA
☆24Updated 6 years ago
adnanozsoy / CUDA_Compression
A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…
☆35Updated 9 years ago
CoffeeBeforeArch / spring_2020_tutorial
"Hardware, Software, and Compilers! Oh My!" tutorial files
☆16Updated 5 years ago
Maratyszcza / psimd
Portable 128-bit SIMD intrinsics
☆58Updated last year
CNugteren / CLCudaAPI
A portable high-level API with CUDA or OpenCL back-end
☆54Updated 7 years ago
milakov / int_fastdiv
Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.
☆71Updated 9 years ago
STEllAR-GROUP / hpxcl
This repository contains components that will support percolation via OpenCL and CUDA
☆32Updated 3 years ago
ashvardanian / ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
☆99Updated 3 weeks ago
codeplaysoftware / visioncpp
A machine vision library written in SYCL and C++ that shows performance-portable implementation of graph algorithms
☆161Updated last year
jrmadsen / PTL
Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…
☆47Updated 7 months ago
Heteroflow / Heteroflow
Concurrent CPU-GPU Programming using Task Models
☆103Updated 5 years ago
HiPerCoRe / KTT
Kernel Tuning Toolkit
☆60Updated last month
maps-gpu / MAPS
GPU Optimization and Memory Abstraction Framework
☆32Updated 5 years ago
CoffeeBeforeArch / parallel_programming
A collection of code examples for learning parallel programming concepts
☆52Updated 4 years ago
hanjianwei / cmake-modules
CMake module collection
☆30Updated 10 years ago
miurahr / cmake-optimize-architecture-flag
CMake module to optimize cflags for architecture extensions such as SSE, AVX
☆27Updated 3 months ago
google / nvidia_libs_test
Tests and benchmarks for cudnn (and in the future, other nvidia libraries)
☆53Updated 4 years ago
OpenCL / AMD_APP_samples
Samples from the AMD APP SDK (with OpenCRun support)
☆16Updated 7 years ago
Twon / std-experimental-simd
A reference implementation of std::simd, providing data parallel types in the C++ standard
☆12Updated 5 years ago
eyalroz / cuda-kat
CUDA kernel author's tools
☆111Updated 3 years ago
dlibml / dnn
Deep Neural Network Architectures with dlib
☆19Updated 5 months ago
klalumiere / NiceMPI
An alternative to Boost.MPI for a user friendly C++ interface for MPI (MPICH).
☆19Updated 7 years ago
Cr33zz / Neuro_
C++ implementation of neural networks library with Keras-like API. Contains majority of commonly used layers, losses and optimizers. Supp…
☆38Updated 4 years ago
ROCm / atmi
Asynchronous Task and Memory Interface, or ATMI, is a runtime framework and programming model for heterogeneous CPU-GPU systems. It provi…
☆68Updated last year
rbaygildin / learn-gpgpu
Algorithms implemented in CUDA + resources about GPGPU
☆56Updated 3 years ago
nickjillings / bitonic-sort
Bitonic Sort for C and CUDA
☆16Updated 6 years ago