The official implementation of the paper "Uncovering the Redundancy in Transformers via a Unified Study of Layer Dropping (TMLR)".
☆188Mar 6, 2026Updated 2 weeks ago
Alternatives and similar repositories for LLM-Drop
Users that are interested in LLM-Drop are comparing it to the libraries listed below
Sorting:
- Source code of ACL 2023 Main Conference Paper "PAD-Net: An Efficient Framework for Dynamic Networks".☆11Feb 28, 2026Updated 3 weeks ago
- Source code of EMNLP 2022 Findings paper "SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters"☆21Feb 28, 2026Updated 3 weeks ago
- The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for E…☆29Feb 28, 2026Updated 3 weeks ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆44Feb 28, 2026Updated 3 weeks ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆89Feb 28, 2026Updated 3 weeks ago
- The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".☆15Jul 2, 2024Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 7 months ago
- ☆16Jul 23, 2024Updated last year
- Pytorch Code for FedHyper☆11Aug 28, 2024Updated last year
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Apr 7, 2025Updated 11 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Feb 16, 2024Updated 2 years ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆61Feb 7, 2025Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆24Mar 4, 2025Updated last year
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆78Apr 29, 2024Updated last year
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆90Sep 13, 2024Updated last year
- ☆131Oct 1, 2024Updated last year
- ☆42Oct 31, 2024Updated last year
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 9 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆163Apr 13, 2025Updated 11 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆55Oct 9, 2025Updated 5 months ago
- ☆15Apr 11, 2024Updated last year
- [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches☆62Mar 4, 2025Updated last year
- Awesome list for LLM pruning.☆290Oct 11, 2025Updated 5 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆642Mar 4, 2024Updated 2 years ago
- [WIP🚧] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star 🌟 if you find it useful…☆20Jan 5, 2025Updated last year
- This repository contains data, code and models for contextual noncompliance.☆25Jul 18, 2024Updated last year
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆39Jan 12, 2024Updated 2 years ago
- [ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.☆27Apr 21, 2025Updated 11 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆51Oct 18, 2024Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆32Nov 4, 2024Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆204Jul 17, 2024Updated last year
- [TMLR 2025] When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models☆125Mar 6, 2026Updated 2 weeks ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆361Feb 5, 2026Updated last month
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆39Feb 4, 2025Updated last year
- ☆27May 13, 2025Updated 10 months ago
- ☆15Mar 12, 2024Updated 2 years ago
- Official repository of paper "Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models"☆23May 27, 2025Updated 9 months ago
- A simple and effective LLM pruning approach.☆856Aug 9, 2024Updated last year