xuqiantong / CUDA-Winograd
Fast CUDA Kernels for ResNet Inference.
☆164Updated 5 years ago
Related projects: ⓘ
- Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)☆190Updated 5 years ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆191Updated 2 years ago
- Automatic Schedule Exploration and Optimization Framework for Tensor Computations☆175Updated 2 years ago
- Subpart source code of of deepcore v0.7☆27Updated 4 years ago
- tophub autotvm log collections☆70Updated last year
- Benchmark of TVM quantized model on CUDA☆112Updated 4 years ago
- ☆34Updated 2 years ago
- Assembler for NVIDIA Volta and Turing GPUs☆195Updated 2 years ago
- Benchmark scripts for TVM☆73Updated 2 years ago
- Place for meetup slides☆140Updated 3 years ago
- examples for tvm schedule API☆97Updated last year
- Winograd-based convolution implementation in OpenCL☆27Updated 7 years ago
- TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together☆63Updated 6 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆64Updated 5 years ago
- Simple Training and Deployment of Fast End-to-End Binary Networks☆159Updated 2 years ago
- Quantization of Convolutional Neural networks.☆237Updated last month
- ☆193Updated last year
- ☆35Updated this week
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆74Updated last year
- Caffe for Sparse Convolutional Neural Network☆238Updated last year
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆100Updated this week
- ☆141Updated last year
- ☆92Updated 3 years ago
- A Winograd Minimal Filter Implementation in CUDA☆20Updated 3 years ago
- ☆66Updated last year
- PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.☆256Updated 11 months ago
- implementation of winograd minimal convolution algorithm on Intel Architecture☆37Updated 6 years ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆265Updated 2 years ago
- symmetric int8 gemm☆66Updated 4 years ago
- heterogeneity-aware-lowering-and-optimization☆249Updated 7 months ago