Pytorch code for paper QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
☆25Sep 27, 2023Updated 2 years ago
Alternatives and similar repositories for qa-lora
Users that are interested in qa-lora are comparing it to the libraries listed below
Sorting:
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models☆49Nov 5, 2024Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- ☆20Dec 5, 2024Updated last year
- ☆15Sep 24, 2023Updated 2 years ago
- Fork of Flame repo for training of some new stuff in development☆19Feb 20, 2026Updated last week
- Code for my ICLR 2024 TinyPapers paper "Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models"☆16May 26, 2023Updated 2 years ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Apr 15, 2024Updated last year
- ☆52Nov 5, 2024Updated last year
- AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference☆20Jan 24, 2025Updated last year
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆55Oct 9, 2025Updated 4 months ago
- Official PyTorch implementation of QA-LoRA☆145Mar 13, 2024Updated last year
- Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration☆29Nov 22, 2025Updated 3 months ago
- ☆25Oct 31, 2024Updated last year
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Oct 5, 2024Updated last year
- This is the official project of paper: Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conver…☆22Nov 18, 2024Updated last year
- ViTALiTy (HPCA'23) Code Repository☆23Mar 13, 2023Updated 2 years ago
- [COLM 2025] DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation; 知乎:https://zhuanlan.zhihu.c…☆29Mar 5, 2025Updated 11 months ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆101May 30, 2023Updated 2 years ago
- AFPQ code implementation☆23Nov 6, 2023Updated 2 years ago
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆74Jul 14, 2025Updated 7 months ago
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same…☆61Aug 26, 2025Updated 6 months ago
- ☆50Jun 16, 2025Updated 8 months ago
- Implementation of Microscaling data formats in SystemVerilog.☆29Jul 6, 2025Updated 7 months ago
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆69Mar 7, 2024Updated last year
- ☆30Jul 22, 2024Updated last year
- EQ-Net [ICCV 2023]☆30Aug 15, 2023Updated 2 years ago
- ☆25Dec 11, 2021Updated 4 years ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Aug 14, 2024Updated last year
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…☆88Apr 8, 2025Updated 10 months ago
- Pytorch implementation of BiFSMNv2, TNNLS 2023☆35Feb 10, 2023Updated 3 years ago
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆37Aug 20, 2024Updated last year
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆134May 16, 2024Updated last year
- Kick-off repository for starting with Kaggle!☆12Dec 4, 2024Updated last year
- [ICCV-2023] EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization☆28Dec 6, 2023Updated 2 years ago
- Physical Downlink Shared Channel (PDSCH) in 5G New Radio.☆12Jan 29, 2024Updated 2 years ago
- ☆46Sep 27, 2025Updated 5 months ago
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 10 months ago
- a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity☆43May 24, 2025Updated 9 months ago