vineeths96 / Compressed-Transformers
In this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization aware training of the linear layers and demonstrate the performance for 8 bits, 4 bits, 2 bits and 1 bit (binary) quantization.
☆22Updated 3 years ago
Related projects: ⓘ
- [ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition☆30Updated 2 years ago
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆30Updated last year
- Code for High-Capacity Expert Binary Networks (ICLR 2021).☆26Updated 2 years ago
- Position-based Scaled Gradient for Model Quantization and Pruning Code (NeurIPS 2020)☆26Updated 3 years ago
- Post-training sparsity-aware quantization☆32Updated last year
- This repository containts the pytorch scripts to train mixed-precision networks for microcontroller deployment, based on the memory contr…☆47Updated 4 months ago
- Implementation of NeurIPS 2019 paper "Normalization Helps Training of Quantized LSTM"☆30Updated last month
- Official Repo for EdgeQAT☆12Updated 6 months ago
- Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming☆31Updated last year
- Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆25Updated last month
- Quantize pytorch model, support post-training quantization and quantization aware training methods☆13Updated last year
- [ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization☆96Updated 2 years ago
- ☆13Updated 10 months ago
- BitSplit Post-trining Quantization☆46Updated 2 years ago
- [ICLR 2022] "Unified Vision Transformer Compression" by Shixing Yu*, Tianlong Chen*, Jiayi Shen, Huan Yuan, Jianchao Tan, Sen Yang, Ji Li…☆45Updated 9 months ago
- ☆17Updated last year
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆54Updated 6 months ago
- ACL 2023☆38Updated last year
- ☆67Updated 2 years ago
- ☆15Updated last month
- Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer☆98Updated last year
- Unofficial pytorch implementation of Piecewise Linear Unit dynamic activation function☆15Updated last year
- Official PyTorch implementation of paper "Variation-aware Vision Transformer Quantization"☆33Updated 3 months ago
- The collection of training tricks of binarized neural networks.☆71Updated 3 years ago
- [ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vi…☆30Updated 6 months ago
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.☆81Updated last year
- [ICLR 2022] "Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, and No Retraining" by Lu Miao*, Xiaolong Luo*, T…☆29Updated 2 years ago
- A Out-of-box PyTorch Scaffold for Neural Network Quantization-Aware-Training (QAT) Research. Website: https://github.com/zhutmost/neuralz…☆26Updated last year
- Reproducing Quantization paper PACT☆55Updated 2 years ago
- Pytorch implementation of BiFSMN, IJCAI 2022☆21Updated last year