This repo contains the code for studying the interplay between quantization and sparsity methods
☆26Feb 26, 2025Updated last year
Alternatives and similar repositories for quantization-sparsity-interplay
Users that are interested in quantization-sparsity-interplay are comparing it to the libraries listed below
Sorting:
- ☆15Nov 11, 2024Updated last year
- ☆40Nov 22, 2025Updated 3 months ago
- ☆19Oct 2, 2024Updated last year
- ☆19Mar 21, 2023Updated 2 years ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆67Mar 27, 2025Updated 11 months ago
- The official implementation of the DAC 2024 paper GQA-LUT☆20Dec 20, 2024Updated last year
- Benchmarking Attention Mechanism in Vision Transformers.☆20Oct 10, 2022Updated 3 years ago
- A bit-level sparsity-awared multiply-accumulate process element.☆18Jul 9, 2024Updated last year
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆27Apr 21, 2025Updated 10 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆228Jan 11, 2025Updated last year
- ☆28Dec 2, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated last year
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆70Jan 6, 2024Updated 2 years ago
- ☆115Nov 17, 2023Updated 2 years ago
- ☆30Jul 22, 2024Updated last year
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML 2024)☆31Aug 15, 2024Updated last year
- [ICCV-2023] EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization☆28Dec 6, 2023Updated 2 years ago
- [DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…☆82Jun 30, 2024Updated last year
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆41Sep 9, 2025Updated 5 months ago
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 9 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Oct 9, 2025Updated 4 months ago
- An extention of pytorch for low precision training / inference☆10Aug 28, 2023Updated 2 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆40May 1, 2025Updated 10 months ago
- This repository represents training examples for the CVPR 2018 paper "SYQ:Learning Symmetric Quantization For Efficient Deep Neural Netwo…☆31Jul 25, 2019Updated 6 years ago
- [ICLR 2026] FastCar☆16May 22, 2025Updated 9 months ago
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated last year
- A simple cycle-accurate DaDianNao simulator☆13Mar 27, 2019Updated 6 years ago
- Simple Implementation of the CVPR 2024 Paper "JointSQ: Joint Sparsification-Quantization for Distributed Learning"☆11Dec 29, 2024Updated last year
- [CVPR 2021] Contrastive Neural Architecture Search with Neural Architecture Comparators☆40Apr 11, 2022Updated 3 years ago
- Official implementation of the ICLR'25 paper "QERA: an Analytical Framework for Quantization Error Reconstruction".☆13Feb 4, 2025Updated last year
- ☆10Feb 10, 2022Updated 4 years ago
- ☆13Jul 3, 2024Updated last year
- ☆42Apr 23, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆210Nov 25, 2025Updated 3 months ago
- ☆42Dec 15, 2022Updated 3 years ago
- ☆10Jun 28, 2019Updated 6 years ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆47Jun 4, 2024Updated last year