xjjxmu / QSLAW
The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2024]
☆13Updated 5 months ago
Alternatives and similar repositories for QSLAW
Users that are interested in QSLAW are comparing it to the libraries listed below
Sorting:
- ☆13Updated last month
- BESA is a differentiable weight pruning technique for large language models.☆16Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆36Updated last year
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆21Updated 6 months ago
- The codebase for paper "PPT: Token Pruning and Pooling for Efficient Vision Transformer"☆23Updated 6 months ago
- Triton implement of bi-directional (non-causal) linear attention☆47Updated 3 months ago
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆20Updated 11 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Updated last year
- GIFT: Generative Interpretable Fine-Tuning☆20Updated 7 months ago
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆14Updated 3 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 11 months ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆39Updated 11 months ago
- ☆16Updated last year
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"☆38Updated 2 months ago
- ☆15Updated 6 months ago
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Updated last year
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 7 months ago
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆40Updated 2 years ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Updated 9 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆36Updated 7 months ago
- ☆15Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆33Updated 10 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆65Updated last year
- ☆9Updated 8 months ago
- ☆8Updated 2 weeks ago
- (NeurIPS 2024) BiDM: Pushing the Limit of Quantization for Diffusion Models☆20Updated 5 months ago
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Updated last year
- LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification☆53Updated 2 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆33Updated last month
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆12Updated 5 months ago