hahnyuan / ASVD4LLMView external linksLinks
Activation-aware Singular Value Decomposition for Compressing Large Language Models
β88Oct 22, 2024Updated last year
Alternatives and similar repositories for ASVD4LLM
Users that are interested in ASVD4LLM are comparing it to the libraries listed below
Sorting:
- [ICLR 2025π₯] SVD-LLM & [NAACL 2025π₯] SVD-LLM V2β281Aug 28, 2025Updated 5 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projectionβ155Feb 20, 2025Updated 11 months ago
- β63Oct 17, 2023Updated 2 years ago
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decompositionβ17Apr 16, 2025Updated 9 months ago
- β15Nov 7, 2024Updated last year
- AFPQ code implementationβ23Nov 6, 2023Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.β18Nov 19, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"β211Nov 25, 2025Updated 2 months ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.β14Nov 23, 2024Updated last year
- β30Jul 22, 2024Updated last year
- β129Jan 22, 2024Updated 2 years ago
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"β37Aug 20, 2024Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantizationβ172Nov 26, 2025Updated 2 months ago
- Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Modelsβ49Nov 5, 2024Updated last year
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).β32Mar 5, 2024Updated last year
- β40Mar 28, 2024Updated last year
- [ICCV 2025] QuEST: Efficient Finetuning for Low-bit Diffusion Modelsβ55Jun 26, 2025Updated 7 months ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".β88Mar 19, 2025Updated 10 months ago
- β38Aug 7, 2025Updated 6 months ago
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β75Mar 17, 2025Updated 10 months ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projectionsβ21Oct 15, 2024Updated last year
- Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"β80Jul 7, 2025Updated 7 months ago
- For releasing code related to compression methods for transformers, accompanying our publicationsβ455Jan 16, 2025Updated last year
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)β22Jan 22, 2024Updated 2 years ago
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Modelβ¦β68Mar 7, 2024Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsityβ93Sep 4, 2024Updated last year
- Reorder-based post-training quantization for large language modelβ198May 17, 2023Updated 2 years ago
- β25Oct 31, 2024Updated last year
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Modelsβ28Aug 5, 2025Updated 6 months ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMsβ123Jul 4, 2025Updated 7 months ago
- Awesome list for LLM pruning.β282Oct 11, 2025Updated 4 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"β372Feb 14, 2025Updated 11 months ago
- [NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.β180Oct 3, 2024Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLMβ176Jul 12, 2024Updated last year
- [ICLRβ24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"β103Jun 20, 2025Updated 7 months ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)β67Mar 27, 2025Updated 10 months ago
- A collection of research papers on efficient training of DNNsβ70Jul 6, 2022Updated 3 years ago
- Official code for "Algorithmic Capabilities of Random Transformers" (NeurIPS 2024)β16Sep 28, 2024Updated last year
- FPGA-based HyperLogLog Acceleratorβ12Jul 13, 2020Updated 5 years ago