vineeths96 / Compressed-TransformersLinks
In this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization aware training of the linear layers and demonstrate the performance for 8 bits, 4 bits, 2 bits and 1 bit (binary) quantization.
☆24Updated 4 years ago
Alternatives and similar repositories for Compressed-Transformers
Users that are interested in Compressed-Transformers are comparing it to the libraries listed below
Sorting:
- [ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition☆33Updated 3 years ago
- ☆11Updated last year
- Code for High-Capacity Expert Binary Networks (ICLR 2021).☆27Updated 3 years ago
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆33Updated 2 years ago
- Implementation of NeurIPS 2019 paper "Normalization Helps Training of Quantized LSTM"☆31Updated 11 months ago
- TernGEMM: General Matrix Multiply Library with Ternary Weights for Fast DNN Inference☆13Updated 3 years ago
- Position-based Scaled Gradient for Model Quantization and Pruning Code (NeurIPS 2020)☆26Updated 4 years ago
- Implementation of a Quantized Transformer Model☆19Updated 6 years ago
- The collection of training tricks of binarized neural networks.☆72Updated 4 years ago
- An 8bit automated quantization conversion tool for the pytorch (Post-training quantization based on KL divergence)☆33Updated 5 years ago
- This repository containts the pytorch scripts to train mixed-precision networks for microcontroller deployment, based on the memory contr…☆50Updated last year
- Post-training sparsity-aware quantization☆34Updated 2 years ago
- # Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang P…☆34Updated 2 years ago
- [ICLR 2022] "Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, and No Retraining" by Lu Miao*, Xiaolong Luo*, T…☆30Updated 3 years ago
- Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer☆109Updated 2 years ago
- [UNDER CONSTRUCTION]unofficial implementation of ABC-Net☆12Updated 7 years ago
- ☆21Updated 2 years ago
- BitSplit Post-trining Quantization☆50Updated 3 years ago
- Pytorch implementation of BiFSMNv2, TNNLS 2023☆31Updated 2 years ago
- GBDT-NAS☆28Updated 3 years ago
- Binarize convolutional neural networks using pytorch☆146Updated 3 years ago
- OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM☆47Updated 9 months ago
- TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.☆26Updated 2 years ago
- PyTorch Quantization Aware Training Example☆137Updated last year
- Pytorch implementation of BiFSMN, IJCAI 2022☆21Updated 2 years ago
- [ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization☆95Updated 3 years ago
- ☆28Updated 3 years ago
- 3rd place solution for NeurIPS 2019 MicroNet challenge☆35Updated 5 years ago
- Neural Architecture Search for Neural Network Libraries☆59Updated last year
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binar…☆56Updated last year