vineeths96 / Compressed-TransformersView on GitHub
In this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization aware training of the linear layers and demonstrate the performance for 8 bits, 4 bits, 2 bits and 1 bit (binary) quantization.
24May 14, 2021Updated 4 years ago

Alternatives and similar repositories for Compressed-Transformers

Users that are interested in Compressed-Transformers are comparing it to the libraries listed below

Sorting:

Are these results useful?