VijayLingam95 / SVFTLinks
☆31Updated 6 months ago
Alternatives and similar repositories for SVFT
Users that are interested in SVFT are comparing it to the libraries listed below
Sorting:
- ☆16Updated 10 months ago
- ☆34Updated 2 years ago
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆33Updated 9 months ago
- ☆30Updated last year
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆30Updated 9 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆54Updated 2 years ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated 2 years ago
- ☆36Updated last year
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆38Updated last year
- ☆34Updated 5 months ago
- AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)☆19Updated last month
- ☆19Updated 6 months ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆25Updated last month
- Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)☆36Updated 2 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆81Updated 2 years ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆44Updated last year
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated last year
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆70Updated 2 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆54Updated 6 months ago
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆23Updated last year
- ☆20Updated last year
- Code accompanying the paper "Massive Activations in Large Language Models"☆176Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆39Updated 10 months ago
- Codes for Merging Large Language Models☆33Updated last year
- ☆85Updated last year
- ☆50Updated last year
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆56Updated last year
- Unofficial Implementation of Selective Attention Transformer☆17Updated 9 months ago
- Long Context Extension and Generalization in LLMs☆58Updated 11 months ago