Activation-aware Singular Value Decomposition for Compressing Large Language Models
β91Oct 22, 2024Updated last year
Alternatives and similar repositories for ASVD4LLM
Users that are interested in ASVD4LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025π₯] SVD-LLM & [NAACL 2025π₯] SVD-LLM V2β286Aug 28, 2025Updated 6 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projectionβ155Feb 20, 2025Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decompositionβ18Apr 16, 2025Updated 11 months ago
- β64Oct 17, 2023Updated 2 years ago
- β128Jan 22, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projectionsβ21Oct 15, 2024Updated last year
- Dataset Quantization with Active Learning based Adaptive Sampling [ECCV 2024]β10Jul 9, 2024Updated last year
- This repository provides the official implementation of QSVD, a method for efficient low-rank approximation that unifies Query-Key-Value β¦β25Dec 1, 2025Updated 3 months ago
- β15Nov 7, 2024Updated last year
- [ICCV 2025] QuEST: Efficient Finetuning for Low-bit Diffusion Modelsβ57Jun 26, 2025Updated 9 months ago
- APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decodingβ14Jul 22, 2024Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β29Feb 17, 2025Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β211Nov 25, 2025Updated 4 months ago
- β20Nov 26, 2025Updated 4 months ago
- NordVPN Threat Protection Proβ’ β’ AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- For releasing code related to compression methods for transformers, accompanying our publicationsβ456Jan 16, 2025Updated last year
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximationβ29Mar 16, 2026Updated last week
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ172Nov 26, 2025Updated 4 months ago
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)β88Jul 28, 2025Updated 7 months ago
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"β81Jul 7, 2025Updated 8 months ago
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Modelsβ28Aug 5, 2025Updated 7 months ago
- β30Jul 22, 2024Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ38Sep 24, 2024Updated last year
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diβ¦β68Jun 4, 2024Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official Implementation of paper "Distilling Long-tailed Datasets" [CVPR 2025]β21Aug 13, 2025Updated 7 months ago
- AFPQ code implementationβ23Nov 6, 2023Updated 2 years ago
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β83Mar 17, 2025Updated last year
- [ICLR 2025] Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"β52Oct 19, 2025Updated 5 months ago
- Benchmark tests supporting the TiledCUDA library.β18Nov 19, 2024Updated last year
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Modelsβ55Aug 9, 2024Updated last year
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β380Feb 14, 2025Updated last year
- LLM Quantization toolkitβ19Jul 4, 2025Updated 8 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)β67Mar 27, 2025Updated 11 months ago
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 2 years ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline modβ¦β628Sep 11, 2024Updated last year
- Awesome list for LLM pruning.β290Oct 11, 2025Updated 5 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ229Jan 11, 2025Updated last year
- This repository contains code for the MicroAdam paper.β21Dec 14, 2024Updated last year
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study unβ¦β17Dec 17, 2025Updated 3 months ago
- β42Mar 28, 2024Updated last year