Prune transformer layers
☆74May 30, 2024Updated last year
Alternatives and similar repositories for short-transformers
Users that are interested in short-transformers are comparing it to the libraries listed below
Sorting:
- A block pruning framework for LLMs.☆28May 17, 2025Updated 9 months ago
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆39Feb 4, 2025Updated last year
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- ☆23Nov 26, 2024Updated last year
- Official implementation for LaCo (EMNLP 2024 Findings)☆21Oct 3, 2024Updated last year
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆90Sep 13, 2024Updated last year
- ☆12Oct 9, 2023Updated 2 years ago
- ☆30Jul 22, 2024Updated last year
- Implementation code for ACL2024:Advancing Parameter Efficiency in Fine-tuning via Representation Editing☆15Apr 20, 2024Updated last year
- Official release of the benchmark in paper "VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for…☆16Aug 1, 2025Updated 7 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆262Apr 23, 2024Updated last year
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆41May 1, 2025Updated 10 months ago
- ☆20Oct 13, 2024Updated last year
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆27Dec 14, 2025Updated 2 months ago
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆24Mar 16, 2025Updated 11 months ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- ☆25Oct 31, 2024Updated last year
- [CVPR '24] Official implementation of the paper "Multiflow: Shifting Towards Task-Agnostic Vision-Language Pruning".☆23Mar 7, 2025Updated last year
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆73Mar 25, 2025Updated 11 months ago
- [AAMAS 2026] Don’t Blind Your VLA: Aligning Visual Representations for OOD Generalization. https://blind-vla-paper.github.io☆59Jan 25, 2026Updated last month
- ☆63Dec 15, 2024Updated last year
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆106Jul 1, 2024Updated last year
- ☆71Aug 27, 2024Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31May 22, 2024Updated last year
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML 2024)☆31Aug 15, 2024Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆67Mar 27, 2025Updated 11 months ago
- InSales e-commerce platform API bindings☆14Jul 13, 2024Updated last year
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers …☆35Apr 8, 2023Updated 2 years ago
- ☆40Nov 22, 2025Updated 3 months ago
- ☆42Mar 28, 2024Updated last year
- [NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"☆74Nov 15, 2022Updated 3 years ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated last year
- SimADFuzz: Simulation-Feedback Fuzz Testing for Autonomous Driving Systems☆10Apr 11, 2025Updated 10 months ago
- ☆38Dec 19, 2024Updated last year
- ☆17Feb 6, 2025Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆145Sep 10, 2023Updated 2 years ago
- For releasing code related to compression methods for transformers, accompanying our publications☆455Jan 16, 2025Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Sep 24, 2024Updated last year