vineeths96 / Compressed-TransformersLinks

In this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization aware training of the linear layers and demonstrate the performance for 8 bits, 4 bits, 2 bits and 1 bit (binary) quantization.

☆24

Alternatives and similar repositories for Compressed-Transformers

Users that are interested in Compressed-Transformers are comparing it to the libraries listed below

Sorting:

kssteven418 / Q-ASR
[ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition
☆34Updated 4 years ago
Jangho-Kim / PSG-pytorch
Position-based Scaled Gradient for Model Quantization and Pruning Code (NeurIPS 2020)
☆25Updated 5 years ago
houlu369 / Normalized-Quantized-LSTM
Implementation of NeurIPS 2019 paper "Normalization Helps Training of Quantized LSTM"
☆31Updated last year
Andrew-Tierno / QuantizedTransformer
Implementation of a Quantized Transformer Model
☆19Updated 6 years ago
microsoft / only_train_once
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
☆48Updated last year
mrusci / training-mixed-precision-quantized-networks
This repository containts the pytorch scripts to train mixed-precision networks for microcontroller deployment, based on the memory contr…
☆50Updated last year
1adrianb / expert-binary-networks
Code for High-Capacity Expert Binary Networks (ICLR 2021).
☆27Updated 3 years ago
facebookresearch / bit
Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer
☆113Updated 2 years ago
gilshm / sparq
Post-training sparsity-aware quantization
☆34Updated 2 years ago
lswzjuer / pytorch-quantity
An 8bit automated quantization conversion tool for the pytorch (Post-training quantization based on KL divergence)
☆32Updated 6 years ago
1adrianb / binary-networks-pytorch
Binarize convolutional neural networks using pytorch
☆149Updated 3 years ago
leimao / PyTorch-Quantization-Aware-Training
PyTorch Quantization Aware Training Example
☆144Updated last year
ModelTC / LPCV2021_Winner_Solution
☆28Updated 4 years ago
snap-research / F8Net
[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
☆94Updated 3 years ago
skmhrk1209 / QuanTorch
PyTorch implementation of "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
☆56Updated 6 years ago
ShuaiZ1037 / bnn-xnor-bireal
Pytorch implementations of the BNN, XNOR-Net and BiReal-Net
☆15Updated 5 years ago
tigert1998 / qat
Manually implemented quantization-aware training
☆21Updated 3 years ago
peiswang / BitSplit
BitSplit Post-trining Quantization
☆50Updated 3 years ago
WeixiangXu / STTN
☆17Updated 3 years ago
yashbhalgat / QualcommAI-MicroNet-submission-MixNet
3rd place solution for NeurIPS 2019 MicroNet challenge
☆35Updated 6 years ago
HuangOwen / Quantization-Variation
[TMLR] Official PyTorch implementation of paper "Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precisio…
☆46Updated last year
zhuyinheng / ABC-Net-pytorch
[UNDER CONSTRUCTION]unofficial implementation of ABC-Net
☆12Updated 7 years ago
VITA-Group / UVC
[ICLR 2022] "Unified Vision Transformer Compression" by Shixing Yu*, Tianlong Chen*, Jiayi Shen, Huan Yuan, Jianchao Tan, Sen Yang, Ji Li…
☆54Updated last year
Qualcomm-AI-research / transformer-quantization
☆207Updated 4 years ago
VITA-Group / Structure-LTH
[ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…
☆33Updated 2 years ago
zhutmost / neuralzip
A Out-of-box PyTorch Scaffold for Neural Network Quantization-Aware-Training (QAT) Research. Website: https://github.com/zhutmost/neuralz…
☆25Updated 2 years ago
papers-submission / CalibTIP
Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming
☆35Updated 2 years ago
kssteven418 / I-BERT
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
☆261Updated 2 years ago
Shunli-Wang / Tiny-YOLO-LSQ
This is an implementation of YOLO using LSQ network quantization method.
☆22Updated 3 years ago
KwangHoonAn / PACT
Reproducing Quantization paper PACT
☆64Updated 3 years ago