maltanar / gemmbitserialLinks

Fast matrix multiplication for few-bit integer matrices on CPUs.

☆28

Alternatives and similar repositories for gemmbitserial

Users that are interested in gemmbitserial are comparing it to the libraries listed below

Sorting:

qinyao-he / bit-rnn
Quantize weights and activations in Recurrent Neural Networks.
☆95Updated 7 years ago
jwfromm / Riptide
Simple Training and Deployment of Fast End-to-End Binary Networks
☆158Updated 3 years ago
YulhwaKim / cutlass_tilesparse
CUDA templates for tile-sparse matrix multiplication based on CUTLASS.
☆50Updated 7 years ago
masahi / tvm-winograd
Test winograd convolution written in TVM for CUDA and AMDGPU
☆41Updated 7 years ago
MatthieuCourbariaux / deep-learning-multipliers
Training deep neural networks with low precision multiplications
☆64Updated 10 years ago
gplhegde / convolution-flavors
Implementation of convolution layer in different flavors
☆68Updated 8 years ago
facebookresearch / deepfloat
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.
☆397Updated 2 years ago
cc-hpc-itwm / TensorQuant
☆47Updated 5 years ago
ARM-software / scalpel
This is a PyTorch implementation of the Scalpel. Node pruning for five benchmark networks and SIMD-aware weight pruning for LeNet-300-100…
☆41Updated 7 years ago
wangmaolin / niti
Implementation of "NITI: Training Integer Neural Networks Using Integer-only Arithmetic" on arxiv
☆86Updated 3 years ago
bwasti / pytorch_compiler_tutorial
Codebase associated with the PyTorch compiler tutorial
☆47Updated 6 years ago
masahi / torchscript-to-tvm
☆68Updated 2 years ago
hessamb / lcnn
LCNN: Lookup-based Convolutional Neural Network
☆52Updated 8 years ago
Tiiiger / QPyTorch
Low Precision Arithmetic Simulation in PyTorch
☆286Updated last year
zhuwenxi / pytorch-profiling-tool
☆54Updated 7 years ago
houlu369 / Loss-aware-weight-quantization
Implementation of ICLR 2018 paper "Loss-aware Weight Quantization of Deep Networks"
☆27Updated 6 years ago
wjc852456 / Neural-Networks-on-Silicon
This is a collection of works on neural networks and neural accelerators.
☆41Updated 6 years ago
Xilinx / graffitist
Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow
☆168Updated 5 years ago
parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆137Updated 3 years ago
mrusci / training-mixed-precision-quantized-networks
This repository containts the pytorch scripts to train mixed-precision networks for microcontroller deployment, based on the memory contr…
☆50Updated last year
anony-sub / chameleon
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation
☆27Updated 6 years ago
larq / compute-engine
Highly optimized inference engine for Binarized Neural Networks
☆251Updated 2 weeks ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆72Updated 9 years ago
YashasSamaga / ConvolutionBuildingBlocks
GEMM and Winograd based convolutions using CUTLASS
☆28Updated 5 years ago
VoVAllen / tf-dlpack
DLPack for Tensorflow
☆35Updated 5 years ago
TalwalkarLab / paleo
An analytical performance modeling tool for deep neural networks.
☆91Updated 5 years ago
plumerai / rethinking-bnn-optimization
Implementation for the paper "Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization"
☆74Updated 5 years ago
andersy005 / tvm-in-action
TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together
☆64Updated 7 years ago
jafermarq / WinogradAwareNets
Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)
☆27Updated 2 years ago
xingyul / sparse-winograd-cnn
Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)
☆193Updated 6 years ago