osehmathias / lisa
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
β24Updated 9 months ago
Alternatives and similar repositories for lisa:
Users that are interested in lisa are comparing it to the libraries listed below
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswalβ¦β48Updated last year
- [NeurIPS-2024] π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623β75Updated 3 months ago
- β121Updated 5 months ago
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)β27Updated 2 months ago
- A curated list of Model Merging methods.β89Updated 4 months ago
- AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.β61Updated 2 months ago
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]β39Updated 2 months ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "β94Updated 2 months ago
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Mergingβ47Updated last month
- β30Updated last year
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"β70Updated 7 months ago
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.β34Updated 3 months ago
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparselyβ20Updated 6 months ago
- An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFTβ61Updated 2 weeks ago
- [NeurIPS 2024 Spotlight] EMR-Merging: Tuning-Free High-Performance Model Mergingβ42Updated 2 months ago
- AnchorAttention: Improved attention for LLMs long-context trainingβ202Updated this week
- Official Pytorch Implementation of "OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning" bβ¦β29Updated 7 months ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"β94Updated 10 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"β133Updated 10 months ago
- A block pruning framework for LLMs.β15Updated 6 months ago
- β16Updated last month
- β27Updated last year
- Codes for Merging Large Language Modelsβ27Updated 5 months ago
- [EMNLP 2023 Main] Sparse Low-rank Adaptation of Pre-trained Language Modelsβ70Updated 10 months ago
- [ATTRIB @ NeurIPS 2024 Oral] When Attention Sink Emerges in Language Models: An Empirical Viewβ43Updated 3 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.β88Updated 3 months ago
- β45Updated last month
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".β107Updated 2 months ago
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)β32Updated last year