amazon-science / mezo_svrgLinks
Code the ICML 2024 paper: "Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models"
☆11Updated last year
Alternatives and similar repositories for mezo_svrg
Users that are interested in mezo_svrg are comparing it to the libraries listed below
Sorting:
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆63Updated 9 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Updated last year
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆113Updated last year
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆36Updated last year
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆120Updated 5 months ago
- ☆216Updated last month
- Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)☆54Updated 6 months ago
- PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models(NeurIPS 2024 Spotlight)☆405Updated 6 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Updated 2 years ago
- An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFT☆131Updated 9 months ago
- ☆61Updated last year
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆36Updated last year
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆24Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆111Updated 3 weeks ago
- LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters☆45Updated 4 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆74Updated last year
- [ICLR'24] "DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training" by Aochuan Chen*, Yimeng Zhang*, Jinghan Jia, James Di…☆68Updated last year
- Code accompanying the paper "Massive Activations in Large Language Models"☆187Updated last year
- ☆36Updated 9 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆82Updated last year
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21Updated last year
- State-of-the-art Parameter-Efficient MoE Fine-tuning Method☆200Updated last year
- ☆85Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆176Updated last year
- [ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆102Updated 6 months ago
- ☆28Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆66Updated 9 months ago
- ☆23Updated last year
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆38Updated 10 months ago
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.☆142Updated 4 months ago