Activation-aware Singular Value Decomposition for Compressing Large Language Models
β92Oct 22, 2024Updated last year
Alternatives and similar repositories for ASVD4LLM
Users that are interested in ASVD4LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025π₯] SVD-LLM & [NAACL 2025π₯] SVD-LLM V2β292Aug 28, 2025Updated 8 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projectionβ158Feb 20, 2025Updated last year
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decompositionβ20Apr 16, 2025Updated last year
- β65Oct 17, 2023Updated 2 years ago
- β129Jan 22, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projectionsβ21Oct 15, 2024Updated last year
- Dataset Quantization with Active Learning based Adaptive Sampling [ECCV 2024]β10Jul 9, 2024Updated last year
- β15Nov 7, 2024Updated last year
- APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decodingβ14Jul 22, 2024Updated last year
- [ICCV 2025] QuEST: Efficient Finetuning for Low-bit Diffusion Modelsβ60Jun 26, 2025Updated 11 months ago
- This repository provides the official implementation of QSVD, a method for efficient low-rank approximation that unifies Query-Key-Value β¦β26May 16, 2026Updated last week
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β216Nov 25, 2025Updated 6 months ago
- For releasing code related to compression methods for transformers, accompanying our publicationsβ462Jan 16, 2025Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ174Nov 26, 2025Updated 6 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)β90Jul 28, 2025Updated 9 months ago
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"β81Jul 7, 2025Updated 10 months ago
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Modelsβ29Aug 5, 2025Updated 9 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"β31Mar 28, 2024Updated 2 years ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. π The official implementation of https://arxβ¦β29Feb 17, 2025Updated last year
- β30Jul 22, 2024Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantizationβ39Sep 24, 2024Updated last year
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diβ¦β71Jun 4, 2024Updated last year
- Official Implementation of paper "Distilling Long-tailed Datasets" [CVPR 2025]β21Aug 13, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- β22Nov 26, 2025Updated 6 months ago
- AFPQ code implementationβ23Nov 6, 2023Updated 2 years ago
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β89Mar 17, 2025Updated last year
- Benchmark tests supporting the TiledCUDA library.β19Nov 19, 2024Updated last year
- [ICLR 2025] Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"β53Oct 19, 2025Updated 7 months ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Modelsβ59Aug 9, 2024Updated last year
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximationβ45Apr 21, 2026Updated last month
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β395Feb 14, 2025Updated last year
- LLM Quantization toolkitβ20May 2, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)β68Mar 27, 2025Updated last year
- Reorder-based post-training quantization for large language modelβ199May 17, 2023Updated 3 years ago
- TA's implementation for the project of Computer Architecture and Intelligent Chip Design (23 Spring)β10May 20, 2023Updated 3 years ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline modβ¦β645Sep 11, 2024Updated last year
- Awesome list for LLM pruning.β296Oct 11, 2025Updated 7 months ago
- This repository contains code for the MicroAdam paper.β21Dec 14, 2024Updated last year
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMsβ233Jan 11, 2025Updated last year