Activation-aware Singular Value Decomposition for Compressing Large Language Models
β91Oct 22, 2024Updated last year
Alternatives and similar repositories for ASVD4LLM
Users that are interested in ASVD4LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025π₯] SVD-LLM & [NAACL 2025π₯] SVD-LLM V2β288Aug 28, 2025Updated 7 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projectionβ154Feb 20, 2025Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decompositionβ19Apr 16, 2025Updated 11 months ago
- β64Oct 17, 2023Updated 2 years ago
- β129Jan 22, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projectionsβ21Oct 15, 2024Updated last year
- Dataset Quantization with Active Learning based Adaptive Sampling [ECCV 2024]β10Jul 9, 2024Updated last year
- This repository provides the official implementation of QSVD, a method for efficient low-rank approximation that unifies Query-Key-Value β¦β26Dec 1, 2025Updated 4 months ago
- β15Nov 7, 2024Updated last year
- APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decodingβ14Jul 22, 2024Updated last year
- [ICCV 2025] QuEST: Efficient Finetuning for Low-bit Diffusion Modelsβ58Jun 26, 2025Updated 9 months ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β28Feb 17, 2025Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β212Nov 25, 2025Updated 4 months ago
- β21Nov 26, 2025Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximationβ33Mar 24, 2026Updated 3 weeks ago
- For releasing code related to compression methods for transformers, accompanying our publicationsβ459Jan 16, 2025Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ170Nov 26, 2025Updated 4 months ago
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)β88Jul 28, 2025Updated 8 months ago
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"β81Jul 7, 2025Updated 9 months ago
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Modelsβ28Aug 5, 2025Updated 8 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"β31Mar 28, 2024Updated 2 years ago
- β30Jul 22, 2024Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ39Sep 24, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diβ¦β69Jun 4, 2024Updated last year
- Official Implementation of paper "Distilling Long-tailed Datasets" [CVPR 2025]β21Aug 13, 2025Updated 8 months ago
- AFPQ code implementationβ23Nov 6, 2023Updated 2 years ago
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β86Mar 17, 2025Updated last year
- Benchmark tests supporting the TiledCUDA library.β18Nov 19, 2024Updated last year
- [ICLR 2025] Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"β52Oct 19, 2025Updated 5 months ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Modelsβ56Aug 9, 2024Updated last year
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β387Feb 14, 2025Updated last year
- LLM Quantization toolkitβ20Updated this week
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)β68Mar 27, 2025Updated last year
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 2 years ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline modβ¦β633Sep 11, 2024Updated last year
- Awesome list for LLM pruning.β287Oct 11, 2025Updated 6 months ago
- This repository contains code for the MicroAdam paper.β21Dec 14, 2024Updated last year
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ229Jan 11, 2025Updated last year
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study unβ¦β18Dec 17, 2025Updated 3 months ago