chrischoy / CUDA-FFT-Convolution
CUDA FFT convolution
☆14Updated 9 years ago
Related projects: ⓘ
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 4 years ago
- Library for fast image convolution in neural networks on Intel Architecture☆29Updated 7 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago
- Fork of magma to include more BLAS☆28Updated 7 years ago
- Some C++ codes for computing a 1D and 2D convolution product using the FFT implemented with the GSL or FFTW☆57Updated 11 years ago
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆135Updated 7 years ago
- sparse matrix pre-processing library☆81Updated 4 months ago
- ☆16Updated this week
- Kernel Fusion and Runtime Compilation Based on NNVM☆69Updated 7 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆96Updated 7 years ago
- High optimized fft library based on CUDA(the same fast as cufft and faster some times)☆18Updated 7 years ago
- Full-speed Array of Structures access☆155Updated last year
- High-Performance Tensor Transpose library☆183Updated last year
- Fast matrix multiplication☆28Updated 3 years ago
- CUDA Tensor Transpose (cuTT) library☆49Updated 7 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆287Updated 5 years ago
- Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm☆34Updated 5 years ago
- kmeans☆53Updated 8 years ago
- Code appendix to an OpenCL matrix-multiplication tutorial☆160Updated 7 years ago
- parallel algorithm based on cuda☆62Updated 6 years ago
- A portable high-level API with CUDA or OpenCL back-end☆53Updated 6 years ago
- tutorial to optimize GEMM performance on android☆51Updated 8 years ago
- flexible-gemm conv of deepcore☆17Updated 4 years ago
- CNNs in Halide☆22Updated 8 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 8 years ago
- A few cuda examples built with cmake☆23Updated 5 years ago
- Quantize weights and activations in Recurrent Neural Networks.☆95Updated 6 years ago
- RDMA Optimization on MXNet☆14Updated 6 years ago
- Example of how to use CUDA with CMake >= 3.8☆69Updated last year