xuqiantong/CUDA-Winograd

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xuqiantong/CUDA-Winograd)

xuqiantong / CUDA-Winograd

Fast CUDA Kernels for ResNet Inference.

☆183

Alternatives and similar repositories for CUDA-Winograd

Users that are interested in CUDA-Winograd are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

md2z34 / winograd_gpu
View on GitHub
GPU implementation of Winograd convolution
☆10Oct 23, 2017Updated 8 years ago
andravin / wincnn
View on GitHub
Winograd minimal convolution algorithm generator for convolutional neural networks.
☆628Feb 9, 2026Updated 5 months ago
UDC-GAC / openCNN
View on GitHub
A Winograd Minimal Filter Implementation in CUDA
☆31Aug 25, 2021Updated 4 years ago
quettabit / convolution_kernel
View on GitHub
Accelerating CNN's convolution operation on GPUs by using memory-efficient data access patterns.
☆14Dec 8, 2017Updated 8 years ago
csehydrogen / Winograd-OpenCL
View on GitHub
Winograd-based convolution implementation in OpenCL
☆29Jan 22, 2017Updated 9 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
xingyul / sparse-winograd-cnn
View on GitHub
Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)
☆191May 7, 2019Updated 7 years ago
masahi / tvm-winograd
View on GitHub
Test winograd convolution written in TVM for CUDA and AMDGPU
☆41Oct 12, 2018Updated 7 years ago
CSshengxy / MEC
View on GitHub
ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)
☆17Apr 9, 2019Updated 7 years ago
marsupialtail / gpu-sparsert
View on GitHub
☆18Oct 15, 2020Updated 5 years ago
dorthyluu / cs194-winograd
View on GitHub
☆25Dec 1, 2016Updated 9 years ago
XiuYuLi / deepcore_source_code
View on GitHub
Subpart source code of of deepcore v0.7
☆27Jun 28, 2020Updated 6 years ago
lixiuhong / implicit_gemm_convolution
View on GitHub
☆14May 28, 2019Updated 7 years ago
pku-liang / FlexTensor
View on GitHub
Automatic Schedule Exploration and Optimization Framework for Tensor Computations
☆184Apr 25, 2022Updated 4 years ago
piojanu / CUDA-im2col-conv
View on GitHub
CUDA project for uni subject
☆26Oct 26, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Mudit7 / CUDA-ResNet
View on GitHub
☆13May 8, 2020Updated 6 years ago
istoony / winograd-convolutional-nn
View on GitHub
I'm going to use the Winograd’s minimal ﬁltering algorithms to introduce a new class of fast algorithms for convolutional neural networks…
☆12Mar 22, 2018Updated 8 years ago
lixiuhong / batched_gemm
View on GitHub
☆40Feb 28, 2020Updated 6 years ago
daadaada / turingas
View on GitHub
Assembler for NVIDIA Volta and Turing GPUs
☆247Jan 13, 2022Updated 4 years ago
c3sr / tcu_scope
View on GitHub
☆50Jun 27, 2019Updated 7 years ago
chasingegg / Winconv
View on GitHub
implementation of winograd minimal convolution algorithm on Intel Architecture
☆40Dec 4, 2017Updated 8 years ago
daadaada / gas
View on GitHub
☆49Dec 11, 2020Updated 5 years ago
HangJie720 / DeepLearning-Training-Cuda
View on GitHub
Transparent Cudnn / Cublas / Eigen usage for the deep learning training using MNIST dataset.
☆18Sep 3, 2020Updated 5 years ago
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆610Apr 20, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
OpenHero / im2col
View on GitHub
image to column
☆30Jul 15, 2014Updated 12 years ago
YashasSamaga / ConvolutionBuildingBlocks
View on GitHub
GEMM and Winograd based convolutions using CUTLASS
☆28Jul 15, 2020Updated 6 years ago
thu-pacman / PET
View on GitHub
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆126Jun 23, 2022Updated 4 years ago
vinx13 / tvm-cuda-int8-benchmark
View on GitHub
Benchmark of TVM quantized model on CUDA
☆112Jun 19, 2020Updated 6 years ago
ravi-teja-mullapudi / Halide-NN
View on GitHub
CNNs in Halide
☆22Oct 22, 2015Updated 10 years ago
TiledTensor / TiledKernel
View on GitHub
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
☆19May 12, 2024Updated 2 years ago
wzsh / wmma_tensorcore_sample
View on GitHub
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆147Aug 18, 2020Updated 5 years ago
uwsampl / SparseTIR
View on GitHub
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆145Mar 31, 2023Updated 3 years ago
StrongSpoon / tvm.schedule
View on GitHub
examples for tvm schedule API
☆101Jun 12, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year
gthparch / NVPTX-SPIRV-Translator
View on GitHub
☆28Oct 25, 2021Updated 4 years ago
pku-liang / AMOS
View on GitHub
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆125Oct 26, 2022Updated 3 years ago
google-research / sputnik
View on GitHub
A library of GPU kernels for sparse matrix operations.
☆289Nov 24, 2020Updated 5 years ago
apuaaChen / vectorSparse
View on GitHub
☆32Aug 24, 2022Updated 3 years ago
SwarmArch / T4
View on GitHub
Code released to accompany the ISCA paper: "T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware"
☆29Feb 18, 2022Updated 4 years ago
PaddlePaddle / Anakin
View on GitHub
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.
☆538Sep 23, 2022Updated 3 years ago