The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
☆188Feb 17, 2026Updated last week
Alternatives and similar repositories for LLM-Drop
Users that are interested in LLM-Drop are comparing it to the libraries listed below
Sorting:
- Source code of ACL 2023 Main Conference Paper "PAD-Net: An Efficient Framework for Dynamic Networks".☆11Updated this week
- The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for E…☆28Updated this week
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 7 months ago
- Source code of EMNLP 2022 Findings paper "SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters"☆21Updated this week
- ☆16Jul 23, 2024Updated last year
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆88Feb 18, 2026Updated last week
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆44Feb 18, 2026Updated last week
- The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".☆14Jul 2, 2024Updated last year
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆156Apr 7, 2025Updated 10 months ago
- ☆130Oct 1, 2024Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆60Feb 7, 2025Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 9 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆39Jan 12, 2024Updated 2 years ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆203Jul 17, 2024Updated last year
- ☆15Apr 11, 2024Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆163Apr 13, 2025Updated 10 months ago
- FocusLLM: Scaling LLM’s Context by Parallel Decoding☆44Dec 8, 2024Updated last year
- This repository contains data, code and models for contextual noncompliance.☆25Jul 18, 2024Updated last year
- UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation☆23May 16, 2025Updated 9 months ago
- ☆21Jul 25, 2025Updated 7 months ago
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆24Mar 4, 2025Updated 11 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- ☆52Jul 18, 2024Updated last year
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆90Sep 13, 2024Updated last year
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated last year
- GoldFinch and other hybrid transformer components☆45Jul 20, 2024Updated last year
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆52Oct 14, 2024Updated last year
- [TMLR 2025] When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models☆125Feb 15, 2026Updated 2 weeks ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆147Sep 20, 2024Updated last year
- ☆24May 13, 2025Updated 9 months ago
- Fluid Language Model Benchmarking☆26Sep 16, 2025Updated 5 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆358Feb 5, 2026Updated 3 weeks ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆77Apr 29, 2024Updated last year
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆33May 9, 2024Updated last year
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆55Oct 9, 2025Updated 4 months ago
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆20Apr 9, 2025Updated 10 months ago
- ☆23Sep 29, 2024Updated last year