chrischoy / CUDA-FFT-Convolution
CUDA FFT convolution
☆14Updated 9 years ago
Related projects ⓘ
Alternatives and complementary repositories for CUDA-FFT-Convolution
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 4 years ago
- Fork of magma to include more BLAS☆28Updated 7 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago
- High-Performance Tensor Transpose library☆185Updated last year
- Library for fast image convolution in neural networks on Intel Architecture☆29Updated 7 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆291Updated 5 years ago
- CUDA Tensor Transpose (cuTT) library☆50Updated 7 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆69Updated 8 years ago
- sparse matrix pre-processing library☆81Updated 6 months ago
- Generalized Histograms for CUDA-capable GPUs☆43Updated 9 years ago
- Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm☆34Updated 5 years ago
- Full-speed Array of Structures access☆161Updated last year
- Vector Math Library☆75Updated 7 years ago
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆135Updated 7 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- kmeans clustering with multi-GPU capabilities☆116Updated last year
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- Multi-dimensional array programming framework for C++ and multi-GPU CUDA applications☆28Updated 7 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- flexible-gemm conv of deepcore☆17Updated 4 years ago
- Subpart source code of of deepcore v0.7☆27Updated 4 years ago
- Fast matrix multiplication☆28Updated 3 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆66Updated 5 years ago
- ☆90Updated 7 years ago
- Dolphin - a Deep Learning on MIC architecture Project.☆25Updated 10 years ago
- Sparse matrix computation library for GPU☆54Updated 4 years ago
- ☆42Updated 6 years ago
- A few cuda examples built with cmake☆23Updated 5 years ago