uanu2002 / JSQ
[ICML 2024] JSQ: Compressing Large Language Models by Joint Sparsification and Quantization
☆148Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for JSQ
- [ECCV 2022] Patch Similarity Aware Data-Free Quantization for Vision Transformers☆120Updated last year
- [ICCV 2023] RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers☆113Updated 10 months ago
- [ICCV 2023] I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference☆156Updated 2 months ago
- ☆11Updated 2 weeks ago
- Large Language Model (LLM) Serving Paper and Resource List☆13Updated 2 months ago
- The official implementation of the DAC 2024 paper GQA-LUT☆10Updated 2 months ago
- ☆12Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆26Updated 5 months ago
- PyTorch code for Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers☆34Updated 2 months ago
- This is a repository of Binary General Matrix Multiply (BGEMM) by customized CUDA kernel. Thank FP6-LLM for the wheels!☆13Updated 2 months ago
- [NeurIPS 2023] Token-Scaled Logit Distillation for Ternary Weight Generative Language Models☆17Updated 11 months ago
- SQuant [ICLR22]☆158Updated 2 years ago
- An Tensorflow.keras implementation of Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorizatio…☆10Updated 4 years ago
- ☆37Updated last month
- [ICML 2022 Spotlight] Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks☆10Updated last year
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆84Updated 6 months ago
- Deep learning framework realized by Numpy purely, supports for both Dynamic Graph and Static Graph with GPU acceleration☆261Updated 3 years ago
- Official PyTorch implementation of IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact☆32Updated 5 months ago
- Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization☆63Updated last week
- Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…☆37Updated 7 months ago
- [CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric☆51Updated last year
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆31Updated 11 months ago
- [TMLR] Official PyTorch implementation of paper "Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precisio…☆34Updated last month
- AFPQ code implementation☆18Updated last year
- Join the High Accuracy Club on ImageNet with A Binary Neural Network Ticket☆55Updated last year
- Efficient tensor decomposition-based filter pruning☆12Updated 4 months ago
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆12Updated 4 months ago
- Post-Training Quantization for Vision transformers.☆190Updated 2 years ago
- ☆18Updated 4 months ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆42Updated last year