andravin / wincnn
Winograd minimal convolution algorithm generator for convolutional neural networks.
☆611Updated 4 years ago
Alternatives and similar repositories for wincnn:
Users that are interested in wincnn are comparing it to the libraries listed below
- Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)☆190Updated 5 years ago
- Caffe for Sparse Convolutional Neural Network☆238Updated 2 years ago
- Ristretto: Caffe-based approximation of convolutional neural networks.☆291Updated 5 years ago
- Fast CUDA Kernels for ResNet Inference.☆171Updated 5 years ago
- BLISlab: A Sandbox for Optimizing GEMM☆498Updated 3 years ago
- Caffe Implementation for Incremental network quantization☆190Updated 6 years ago
- Automatic Schedule Exploration and Optimization Framework for Tensor Computations☆176Updated 2 years ago
- An efficient framework for convolutional neural networks☆274Updated last year
- TVM integration into PyTorch☆452Updated 5 years ago
- (New version is out: https://github.com/hpi-xnor/BMXNet-v2) BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet☆350Updated 5 years ago
- Explore the energy-efficient dataflow scheduling for neural networks.☆219Updated 4 years ago
- Caffe implementation of accurate low-precision neural networks☆117Updated 6 years ago
- Training Deep Neural Networks with binary weights during propagations☆377Updated 9 years ago
- ☆402Updated 5 years ago
- Low-precision matrix multiplication☆1,792Updated last year
- Optimizing Mobile Deep Learning on ARM GPU with TVM☆180Updated 6 years ago
- A CUDNN minimal deep learning training code sample using LeNet.☆264Updated last year
- Caffe for Sparse and Low-rank Deep Neural Networks☆378Updated 4 years ago
- tophub autotvm log collections☆70Updated 2 years ago
- Place for meetup slides☆140Updated 4 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆974Updated 5 months ago
- Quantization of Convolutional Neural networks.☆243Updated 6 months ago
- Dive into Deep Learning Compiler☆647Updated 2 years ago
- An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.☆392Updated last year
- collection of works aiming at reducing model sizes or the ASIC/FPGA accelerator for machine learning☆556Updated last year
- Symbolic Expression and Statement Module for new DSLs☆205Updated 4 years ago
- Simple Training and Deployment of Fast End-to-End Binary Networks☆158Updated 3 years ago
- ☆195Updated last year
- BinaryNets in TensorFlow with XNOR GEMM op☆155Updated 7 years ago
- Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1☆298Updated 3 years ago