NVlabs / EoRALinks
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
β25Updated 3 months ago
Alternatives and similar repositories for EoRA
Users that are interested in EoRA are comparing it to the libraries listed below
Sorting:
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"β57Updated 4 months ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β28Updated 8 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Modelsβ80Updated last year
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsityβ61Updated 4 months ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inferenceβ54Updated 11 months ago
- β30Updated last year
- β146Updated 8 months ago
- β29Updated 11 months ago
- LLM Inference with Microscaling Formatβ32Updated 11 months ago
- β46Updated last year
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costsβ21Updated 10 months ago
- β130Updated 5 months ago
- β18Updated 10 months ago
- β58Updated last year
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inferenceβ47Updated last year
- β23Updated last year
- β60Updated this week
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"β73Updated 4 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retentiβ¦β67Updated last year
- β83Updated 9 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ38Updated last year
- The evaluation framework for training-free sparse attention in LLMsβ102Updated 3 weeks ago
- Fast and memory-efficient exact attentionβ72Updated 8 months ago
- β26Updated 7 months ago
- β21Updated 8 months ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Modelsβ45Updated last year
- AFPQ code implementationβ23Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Modelsβ36Updated last year
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)β70Updated 3 months ago
- β36Updated last year