masahi / tvm-winogradLinks

Test winograd convolution written in TVM for CUDA and AMDGPU

☆41

Alternatives and similar repositories for tvm-winograd

Users that are interested in tvm-winograd are comparing it to the libraries listed below

Sorting:

bwasti / pytorch_compiler_tutorial
Codebase associated with the PyTorch compiler tutorial
☆46Updated 5 years ago
tvmai / tvmai.github.io
Move to https://github.com/apache/incubator-tvm-site
☆27Updated 4 years ago
MatthieuCourbariaux / deep-learning-multipliers
Training deep neural networks with low precision multiplications
☆63Updated 10 years ago
jwfromm / Riptide
Simple Training and Deployment of Fast End-to-End Binary Networks
☆157Updated 3 years ago
zhuwenxi / pytorch-profiling-tool
☆54Updated 7 years ago
ARM-software / scalpel
This is a PyTorch implementation of the Scalpel. Node pruning for five benchmark networks and SIMD-aware weight pruning for LeNet-300-100…
☆41Updated 6 years ago
cc-hpc-itwm / TensorQuant
☆47Updated 5 years ago
spcl / ucudnn
Accelerating DNN Convolutional Layers with Micro-batches
☆63Updated 5 years ago
zhangxinqian / example-of-nnvm-in-cpp
An Example of MXNet Models Comilation and Deployment with NNVM in C++
☆16Updated 7 years ago
dongyp13 / Stochastic-Quantization
Training Low-bits DNNs with Stochastic Quantization
☆74Updated 8 years ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆70Updated 8 years ago
Orion34-lanbo / tvm-batch-matmul-example
☆24Updated 7 years ago
vinx13 / tvm-cuda-int8-benchmark
Benchmark of TVM quantized model on CUDA
☆111Updated 5 years ago
fpeder / espresso
Efficient forward propagation for BCNNs
☆50Updated 8 years ago
hessamb / lcnn
LCNN: Lookup-based Convolutional Neural Network
☆52Updated 7 years ago
microsoft / Analysis-Framework-for-TVM
Static analysis framework for analyzing programs written in TVM's Relay IR.
☆28Updated 5 years ago
zhaoweicai / hwgq
Caffe implementation of accurate low-precision neural networks
☆117Updated 6 years ago
hyln9 / GCNGEMM
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
☆47Updated 8 years ago
MatthieuCourbariaux / 8-bit-deep-learning
Training neural networks with 8-bit computations
☆28Updated 9 years ago
intel / ideep
Intel® Optimization for Chainer*, a Chainer module providing numpy like API and DNN acceleration using MKL-DNN.
☆173Updated this week
mli / dlmark
☆18Updated 7 years ago
qinyao-he / bit-rnn
Quantize weights and activations in Recurrent Neural Networks.
☆94Updated 7 years ago
songhan / SqueezeNet-Generator
SqueezeNet Generator
☆31Updated 7 years ago
VoVAllen / tf-dlpack
DLPack for Tensorflow
☆35Updated 5 years ago
szha / mxnet-jit-batch
Just-in-time Dynamic Batching with MXNet Gluon.
☆52Updated 5 years ago
okdshin / instant
DNN Inference with CPU, C++, ONNX support: Instant
☆56Updated 6 years ago
hcho3 / relayviz
Visualize TVM Relay program graph
☆12Updated 5 years ago
fengfu-chris / caffe-twns
Implementation of Ternary Weight Networks In Caffe
☆63Updated 8 years ago
TalwalkarLab / paleo
An analytical performance modeling tool for deep neural networks.
☆89Updated 4 years ago
xingyul / sparse-winograd-cnn
Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)
☆191Updated 6 years ago