snu-mllab / LayerMerge
Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML'24)
☆29Updated 8 months ago
Alternatives and similar repositories for LayerMerge:
Users that are interested in LayerMerge are comparing it to the libraries listed below
- Are gradient information useful for pruning of LLMs?☆43Updated last year
- The official repo of continuous speculative decoding☆24Updated 3 weeks ago
- [ECCV 2024] Isomorphic Pruning for Vision Models☆68Updated 9 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 10 months ago
- Triton implement of bi-directional (non-causal) linear attention☆46Updated 2 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 10 months ago
- ☆13Updated last month
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆40Updated 2 years ago
- ☆16Updated 4 months ago
- TerDiT: Ternary Diffusion Models with Transformers☆69Updated 10 months ago
- (NeurIPS 2024) BiDM: Pushing the Limit of Quantization for Diffusion Models☆18Updated 5 months ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆59Updated 10 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆35Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆19Updated 4 months ago
- ☆21Updated 2 years ago
- BESA is a differentiable weight pruning technique for large language models.☆16Updated last year
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆48Updated 9 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆33Updated last year
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆95Updated 2 weeks ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 6 months ago
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆20Updated 8 months ago
- The official GitHub page for the survey paper "A Survey of RWKV".☆25Updated 3 months ago
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆18Updated 9 months ago
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆29Updated last year
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆51Updated 3 months ago
- (ICLR 2025) BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models☆19Updated 6 months ago
- GIFT: Generative Interpretable Fine-Tuning☆20Updated 6 months ago
- Work in progress.☆56Updated 2 weeks ago
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]☆52Updated 3 weeks ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 6 months ago