A Winograd Minimal Filter Implementation in CUDA
☆30Aug 25, 2021Updated 4 years ago
Alternatives and similar repositories for openCNN
Users that are interested in openCNN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GPU implementation of Winograd convolution☆10Oct 23, 2017Updated 8 years ago
- Accelerating CNN's convolution operation on GPUs by using memory-efficient data access patterns.☆14Dec 8, 2017Updated 8 years ago
- ☆32Aug 24, 2022Updated 3 years ago
- This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.☆44Sep 29, 2025Updated 8 months ago
- Implementation of the paper - Fast Training of Convolutional Networks through FFTs (CUDA for parallelization)☆10May 8, 2020Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Implementation of the Winograd algorithm.☆24Nov 6, 2018Updated 7 years ago
- CUDA Tensor Transpose (cuTT) library☆55Aug 10, 2017Updated 8 years ago
- Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)☆191May 7, 2019Updated 7 years ago
- The humble incremental-search task switcher for Wox☆20Mar 23, 2014Updated 12 years ago
- Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)☆27Oct 3, 2023Updated 2 years ago
- Examples illustrating usage of the rocBLAS library☆17Aug 12, 2024Updated last year
- Official implementation of the ICLR'25 paper "QERA: an Analytical Framework for Quantization Error Reconstruction".☆14Feb 4, 2025Updated last year
- Implementation of 3d non-separable convolution using CUDA & FFT Convolution☆20Jan 15, 2019Updated 7 years ago
- Sparsity support for PyTorch☆39Mar 22, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- PyTorch implementation of SegBlocks: Towards Block-Based Adaptive Resolution Networks for Fast Segmentation (ECCV2020 Embedded Vision Wor…☆19Mar 31, 2023Updated 3 years ago
- for EE1520 NCKU☆14May 1, 2025Updated last year
- ☆63Jul 21, 2024Updated last year
- Efficient SpGEMM on GPU using CUDA and CSR☆61Jul 18, 2023Updated 2 years ago
- A library of GPU kernels for sparse matrix operations.☆288Nov 24, 2020Updated 5 years ago
- ☆20Nov 5, 2024Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆12Jun 24, 2024Updated last year
- ☆21Jul 30, 2024Updated last year
- ☆21Jan 23, 2026Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Automatic Detection Of Photovoltaic Panels Through Remote Sensing☆17Oct 3, 2020Updated 5 years ago
- ☆121Apr 11, 2024Updated 2 years ago
- 🐱 ncnn int8 模型量化评估☆14Oct 10, 2022Updated 3 years ago
- image to column☆30Jul 15, 2014Updated 11 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆37Sep 15, 2023Updated 2 years ago
- ☆39Feb 28, 2020Updated 6 years ago
- GEMM by WMMA (tensor core)☆15Jul 31, 2022Updated 3 years ago
- ☆10Apr 24, 2023Updated 3 years ago
- MagmaDNN: a simple deep learning framework in c++☆52Aug 21, 2020Updated 5 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- SystemVerilog files for lab project on a DNN hardware accelerator☆18Jun 22, 2021Updated 4 years ago
- GEMM and Winograd based convolutions using CUTLASS☆28Jul 15, 2020Updated 5 years ago
- benchmarking miopen☆17Jan 14, 2019Updated 7 years ago
- ☆115Feb 26, 2026Updated 3 months ago
- Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks☆25Jul 14, 2020Updated 5 years ago
- implementation of winograd minimal convolution algorithm on Intel Architecture☆40Dec 4, 2017Updated 8 years ago
- A Python tool to measure the energy consumption of software☆15Feb 5, 2026Updated 4 months ago