[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆33Jul 1, 2025Updated 8 months ago
Alternatives and similar repositories for Online-Subspace-Descent
Users that are interested in Online-Subspace-Descent are comparing it to the libraries listed below
Sorting:
- ☆10Feb 12, 2024Updated 2 years ago
- ☆13Jan 15, 2025Updated last year
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Computing the greatest common divisor with transformers, source code for the paper https//arxiv.org/abs/2308.15594☆14Aug 11, 2025Updated 6 months ago
- ☆32Aug 11, 2025Updated 6 months ago
- ☆25Oct 31, 2024Updated last year
- Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic☆32Feb 18, 2026Updated 2 weeks ago
- Resa: Transparent Reasoning Models via SAEs☆47Sep 23, 2025Updated 5 months ago
- ☆47Oct 2, 2025Updated 5 months ago
- ☆55Jul 7, 2025Updated 8 months ago
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆124Jul 6, 2025Updated 8 months ago
- ☆32Nov 11, 2024Updated last year
- When it comes to optimizers, it's always better to be safe than sorry☆407Sep 26, 2025Updated 5 months ago
- Codes for Merging Large Language Models☆35Aug 7, 2024Updated last year
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆43Mar 11, 2025Updated 11 months ago
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆36Apr 4, 2024Updated last year
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,678Oct 28, 2024Updated last year
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- ☆10Sep 29, 2024Updated last year
- ☆52Nov 5, 2024Updated last year
- This is the code of a agentic rag method with dynamic workflow.☆12Jan 22, 2026Updated last month
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆48Oct 20, 2025Updated 4 months ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models☆53Aug 9, 2024Updated last year
- Official code for "Algorithmic Capabilities of Random Transformers" (NeurIPS 2024)☆16Sep 28, 2024Updated last year
- Seamless Voice Interactions with LLMs☆12Oct 28, 2023Updated 2 years ago
- [NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations☆19Jan 19, 2025Updated last year
- Official codebase for our paper "Do Language Models Use Their Depth Efficiently?"☆29Jun 25, 2025Updated 8 months ago
- ☆15Aug 19, 2025Updated 6 months ago
- Easy local FLUX.1 Inference☆10Aug 29, 2024Updated last year
- ☆13Feb 20, 2026Updated 2 weeks ago
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated last year
- STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models☆36Feb 12, 2026Updated 3 weeks ago
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- ☆40Jan 16, 2026Updated last month
- ☆11Dec 23, 2022Updated 3 years ago
- ☆14May 21, 2024Updated last year
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- ☆17Dec 23, 2025Updated 2 months ago
- Counterfactual Explanation Based on Gradual Construction for Deep Networks Pytorch☆11Apr 7, 2021Updated 4 years ago