Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆30Mar 28, 2024Updated last year
Alternatives and similar repositories for Bonsai
Users that are interested in Bonsai are comparing it to the libraries listed below
Sorting:
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆27Apr 21, 2025Updated 10 months ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆47Jun 4, 2024Updated last year
- Fine-tuning Quantized Neural Networks with Zeroth-order Optimization☆16Sep 17, 2025Updated 5 months ago
- ☆30Jul 22, 2024Updated last year
- [ICLR 2026] dParallel: Learnable Parallel Decoding for dLLMs☆59Feb 22, 2026Updated last week
- Listwise Reward Estimation for Offline Preference-based Reinforcement Learning (ICML 2024)☆17Jun 18, 2024Updated last year
- Code Implementation for "NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models" (EMNLP …☆17Oct 17, 2023Updated 2 years ago
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated last year
- [NeurIPS 2023] Structural Pruning for Diffusion Models☆217Jul 8, 2024Updated last year
- Structured Pruning Adapters in PyTorch☆19Aug 30, 2023Updated 2 years ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Apr 21, 2025Updated 10 months ago
- [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.☆180Oct 3, 2024Updated last year
- [NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…☆1,106Oct 7, 2024Updated last year
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆39Feb 4, 2025Updated last year
- Exploring Model Kinship for Merging Large Language Models☆27Apr 16, 2025Updated 10 months ago
- ☆23Nov 26, 2024Updated last year
- ☆19Jan 3, 2025Updated last year
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆52Dec 22, 2025Updated 2 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆641Mar 4, 2024Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆151Mar 21, 2025Updated 11 months ago
- For releasing code related to compression methods for transformers, accompanying our publications☆454Jan 16, 2025Updated last year
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Feb 16, 2024Updated 2 years ago
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models☆28Aug 5, 2025Updated 6 months ago
- Official Code for Dataset Distillation using Neural Feature Regression (NeurIPS 2022)☆48Nov 12, 2022Updated 3 years ago
- Code for the paper 'An Information Theoretic Approach to Machine Unlearning'☆23Mar 20, 2025Updated 11 months ago
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆24Jun 26, 2024Updated last year
- ☆58Oct 6, 2023Updated 2 years ago
- Structural Pruning for LLaMA☆54May 20, 2023Updated 2 years ago
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆31Jan 28, 2026Updated last month
- [ICLR 2026] SparseD: Sparse Attention for Diffusion Language Models☆59Feb 22, 2026Updated last week
- AFPQ code implementation☆23Nov 6, 2023Updated 2 years ago
- [CVPR 2025] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers☆74Sep 3, 2024Updated last year
- ☆29Jun 11, 2023Updated 2 years ago
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference☆35Mar 6, 2025Updated 11 months ago
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆70Jan 6, 2024Updated 2 years ago
- ☆63Dec 15, 2024Updated last year
- The framework to prune LLMs to any size and any config.☆95Mar 1, 2024Updated 2 years ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆68Jun 4, 2024Updated last year