osehmathias / lisaLinks
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
☆31Updated last year
Alternatives and similar repositories for lisa
Users that are interested in lisa are comparing it to the libraries listed below
Sorting:
- ☆18Updated 6 months ago
- A Sober Look at Language Model Reasoning☆52Updated last week
- Official Pytorch Implementation of "OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning" b…☆32Updated last year
- ☆83Updated last month
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆23Updated 11 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆69Updated 3 months ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Merging☆59Updated 3 months ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆117Updated last month
- A curated list of Model Merging methods.☆92Updated 8 months ago
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21Updated last year
- A block pruning framework for LLMs.☆23Updated 2 weeks ago
- Codes for Merging Large Language Models☆31Updated 9 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆35Updated 11 months ago
- ☆131Updated 3 weeks ago
- Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLM…☆47Updated last year
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆85Updated 7 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆81Updated 3 weeks ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆97Updated 3 months ago
- ☆54Updated 5 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆51Updated 2 years ago
- ☆105Updated 2 months ago
- ☆29Updated last year
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆35Updated last year
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.