Activation-aware Singular Value Decomposition for Compressing Large Language Models
β92Oct 22, 2024Updated last year
Alternatives and similar repositories for ASVD4LLM
Users that are interested in ASVD4LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025π₯] SVD-LLM & [NAACL 2025π₯] SVD-LLM V2β295Aug 28, 2025Updated 9 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projectionβ159Feb 20, 2025Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decompositionβ20Apr 16, 2025Updated last year
- β65Oct 17, 2023Updated 2 years ago
- β129Jan 22, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projectionsβ22Oct 15, 2024Updated last year
- β15Nov 7, 2024Updated last year
- APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decodingβ14Jul 22, 2024Updated last year
- [ICCV 2025] QuEST: Efficient Finetuning for Low-bit Diffusion Modelsβ60Jun 26, 2025Updated 11 months ago
- This repository provides the official implementation of QSVD, a method for efficient low-rank approximation that unifies Query-Key-Value β¦β27May 16, 2026Updated 3 weeks ago
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β218Nov 25, 2025Updated 6 months ago
- For releasing code related to compression methods for transformers, accompanying our publicationsβ463Jan 16, 2025Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ175Nov 26, 2025Updated 6 months ago
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)β92Jul 28, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"β81Jul 7, 2025Updated 11 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"β32Mar 28, 2024Updated 2 years ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β29Feb 17, 2025Updated last year
- β30Jul 22, 2024Updated last year
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Modelsβ30Aug 5, 2025Updated 10 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ40Sep 24, 2024Updated last year
- Official Implementation of paper "Distilling Long-tailed Datasets" [CVPR 2025]β21Aug 13, 2025Updated 10 months ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diβ¦β72Jun 4, 2024Updated 2 years ago
- β22Nov 26, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- AFPQ code implementationβ23Nov 6, 2023Updated 2 years ago
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β92Mar 17, 2025Updated last year
- Benchmark tests supporting the TiledCUDA library.β19Nov 19, 2024Updated last year
- [ICLR 2025] Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"β53Oct 19, 2025Updated 7 months ago
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximationβ47Apr 21, 2026Updated last month
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β401Feb 14, 2025Updated last year
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Modelsβ62Aug 9, 2024Updated last year
- LLM Quantization toolkitβ20Updated this week
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)β69Mar 27, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 3 years ago
- TA's implementation for the project of Computer Architecture and Intelligent Chip Design (23 Spring)β10May 20, 2023Updated 3 years ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline modβ¦β650Sep 11, 2024Updated last year
- Awesome list for LLM pruning.β298Oct 11, 2025Updated 8 months ago
- This repository contains code for the MicroAdam paper.β21Dec 14, 2024Updated last year
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ234Jan 11, 2025Updated last year
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study unβ¦β19Dec 17, 2025Updated 5 months ago