torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.
☆23Mar 29, 2024Updated last year
Alternatives and similar repositories for torch_quantizer
Users that are interested in torch_quantizer are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆31Mar 30, 2025Updated 11 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- ☆16Sep 12, 2023Updated 2 years ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆68Jun 4, 2024Updated last year
- PyTorch implementation of "Deep Transferring Quantization" (ECCV2020)☆18Jun 22, 2022Updated 3 years ago
- The official implementation of BiViT: Extremely Compressed Binary Vision Transformers☆16Jun 18, 2023Updated 2 years ago
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆53Mar 25, 2025Updated 11 months ago
- Benchmarking Attention Mechanism in Vision Transformers.☆20Oct 10, 2022Updated 3 years ago
- [ICCV 2021] Official implementation of "Scalable Vision Transformers with Hierarchical Pooling"☆33Dec 30, 2021Updated 4 years ago
- TQT's pytorch implementation.☆21Dec 17, 2021Updated 4 years ago
- Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"☆12Jun 7, 2023Updated 2 years ago
- Collections of model quantization algorithms. Any issues, please contact Peng Chen (blueardour@gmail.com)☆45Aug 19, 2021Updated 4 years ago
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆49Oct 5, 2022Updated 3 years ago
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Dec 27, 2023Updated 2 years ago
- The official implementation of "NAS-BNN: Neural Architecture Search for Binary Neural Networks"☆13Aug 30, 2024Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆50Oct 21, 2023Updated 2 years ago
- Unofficial Scalable-Softmax Is Superior for Attention☆20May 30, 2025Updated 9 months ago
- Codes for Accepted Paper : "MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization" in NeurIPS 2019☆54May 8, 2020Updated 5 years ago
- Designs for finalist teams of the DAC System Design Contest☆37Jul 8, 2020Updated 5 years ago
- ☆32Feb 29, 2024Updated 2 years ago
- A collection of object-compositional modeling by implicit neural representation.☆58Aug 12, 2023Updated 2 years ago
- [ICML 2022] ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks☆15May 18, 2022Updated 3 years ago
- [CVPR 2023] This is the official PyTorch implementation for "Dynamic Focus-aware Positional Queries for Semantic Segmentation".☆61Mar 4, 2023Updated 2 years ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆184Apr 16, 2024Updated last year
- ACL 2023☆39Jun 6, 2023Updated 2 years ago
- Pytorch implementation of our paper accepted by NeurIPS 2022 -- Learning Best Combination for Efficient N:M Sparsity☆22Jan 13, 2023Updated 3 years ago
- PyTorch implementation of Towards Efficient Training for Neural Network Quantization☆16Jan 16, 2020Updated 6 years ago
- The code for Joint Neural Architecture Search and Quantization☆14Apr 10, 2019Updated 6 years ago
- Structured Binary Neural Networks for Image Recognition☆18Nov 18, 2021Updated 4 years ago
- This is a repository of Binary General Matrix Multiply (BGEMM) by customized CUDA kernel. Thank FP6-LLM for the wheels!☆18Aug 30, 2024Updated last year
- ☆17Jul 10, 2022Updated 3 years ago
- ☆97Updated this week
- Implementation and optimization of matrix multiplication on single CPU (HPC-THU-2023-Autumn)☆18Feb 27, 2024Updated 2 years ago
- This is the official PyTorch implementation for "Sharpness-aware Quantization for Deep Neural Networks".☆44Nov 25, 2021Updated 4 years ago
- [TPAMI 2024] This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.☆115Dec 30, 2023Updated 2 years ago
- [ICCV 2023 oral] This is the official repository for our paper: ''Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning''.☆75Sep 24, 2023Updated 2 years ago
- [NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"☆74Nov 15, 2022Updated 3 years ago
- DNN quantization with outlier channel splitting (ICML'19)☆113Mar 21, 2020Updated 5 years ago
- Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation☆16Feb 7, 2022Updated 4 years ago