kyleliang919 / Online-Subspace-DescentView external linksLinks
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆33Jul 1, 2025Updated 7 months ago
Alternatives and similar repositories for Online-Subspace-Descent
Users that are interested in Online-Subspace-Descent are comparing it to the libraries listed below
Sorting:
- ☆13Jan 15, 2025Updated last year
- ☆67Mar 21, 2025Updated 10 months ago
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- A side project that follows all the acceleration tricks in tinyllama, with the minimal modification to the huggingface transformers code.☆13Sep 2, 2024Updated last year
- Computing the greatest common divisor with transformers, source code for the paper https//arxiv.org/abs/2308.15594☆14Aug 11, 2025Updated 6 months ago
- ☆32Aug 11, 2025Updated 6 months ago
- ☆25Oct 31, 2024Updated last year
- Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic☆32Sep 21, 2025Updated 4 months ago
- ☆47Oct 2, 2025Updated 4 months ago
- ☆54Jul 7, 2025Updated 7 months ago
- ☆31Nov 11, 2024Updated last year
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆123Jul 6, 2025Updated 7 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆404Sep 26, 2025Updated 4 months ago
- Codes for Merging Large Language Models☆35Aug 7, 2024Updated last year
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆42Mar 11, 2025Updated 11 months ago
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆36Apr 4, 2024Updated last year
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,672Oct 28, 2024Updated last year
- Python package for plotting tephigrams☆11Feb 4, 2021Updated 5 years ago
- ☆10Sep 29, 2024Updated last year
- ☆52Nov 5, 2024Updated last year
- This is the code of a agentic rag method with dynamic workflow.☆13Jan 22, 2026Updated 3 weeks ago
- STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models☆21Updated this week
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 4 months ago
- Run code-llama with 50k tokens using flash attention and better transformer☆12Nov 21, 2023Updated 2 years ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆48Oct 20, 2025Updated 3 months ago
- Generating Summaries with Controllable Readability Levels (EMNLP 2023)☆14Aug 6, 2025Updated 6 months ago
- Aline: Agentic Git for Vibe Coders☆36Nov 26, 2025Updated 2 months ago
- ☆13Jun 22, 2025Updated 7 months ago
- The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2…☆14Dec 7, 2024Updated last year
- ☆39Jan 16, 2026Updated last month
- Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"☆19Jul 11, 2024Updated last year
- Kraken API Guide for the Algotrading101 blog☆13Sep 4, 2022Updated 3 years ago
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated last year
- Official code for "Algorithmic Capabilities of Random Transformers" (NeurIPS 2024)☆16Sep 28, 2024Updated last year
- ☆14May 21, 2024Updated last year
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- The implement of FedCyBGD☆11Jul 19, 2024Updated last year
- [ICML 2025] Improving Planning of Agents for Long-Horizon Tasks☆22Oct 2, 2025Updated 4 months ago
- The code for ”T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval“☆21Jul 30, 2025Updated 6 months ago