Activation-aware Singular Value Decomposition for Compressing Large Language Models
β88Oct 22, 2024Updated last year
Alternatives and similar repositories for ASVD4LLM
Users that are interested in ASVD4LLM are comparing it to the libraries listed below
Sorting:
- [ICLR 2025π₯] SVD-LLM & [NAACL 2025π₯] SVD-LLM V2β282Aug 28, 2025Updated 6 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projectionβ154Feb 20, 2025Updated last year
- β63Oct 17, 2023Updated 2 years ago
- β15Nov 7, 2024Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decompositionβ18Apr 16, 2025Updated 10 months ago
- AFPQ code implementationβ23Nov 6, 2023Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.β18Nov 19, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β210Nov 25, 2025Updated 3 months ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.β14Nov 23, 2024Updated last year
- β30Jul 22, 2024Updated last year
- β129Jan 22, 2024Updated 2 years ago
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"β37Aug 20, 2024Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ172Nov 26, 2025Updated 3 months ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Modelsβ49Nov 5, 2024Updated last year
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).β33Mar 5, 2024Updated 2 years ago
- β42Mar 28, 2024Updated last year
- [ICCV 2025] QuEST: Efficient Finetuning for Low-bit Diffusion Modelsβ57Jun 26, 2025Updated 8 months ago
- β38Aug 7, 2025Updated 6 months ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".β89Updated this week
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"β81Jul 7, 2025Updated 8 months ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projectionsβ21Oct 15, 2024Updated last year
- For releasing code related to compression methods for transformers, accompanying our publicationsβ455Jan 16, 2025Updated last year
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β80Mar 17, 2025Updated 11 months ago
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)β22Jan 22, 2024Updated 2 years ago
- Boosting 4-bit inference kernels with 2:4 Sparsityβ93Sep 4, 2024Updated last year
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Modelβ¦β69Mar 7, 2024Updated last year
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 2 years ago
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Modelsβ28Aug 5, 2025Updated 7 months ago
- β25Oct 31, 2024Updated last year
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsβ122Jul 4, 2025Updated 8 months ago
- Awesome list for LLM pruning.β288Oct 11, 2025Updated 4 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β374Feb 14, 2025Updated last year
- [NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.β180Oct 3, 2024Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)β67Mar 27, 2025Updated 11 months ago
- [ICLRβ24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"β102Jun 20, 2025Updated 8 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLMβ179Jul 12, 2024Updated last year
- A collection of research papers on efficient training of DNNsβ69Jul 6, 2022Updated 3 years ago
- APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decodingβ14Jul 22, 2024Updated last year
- Implementation of Hyena Hierarchy in JAXβ10Apr 30, 2023Updated 2 years ago