zqOuO / GWTLinks
☆13Updated 6 months ago
Alternatives and similar repositories for GWT
Users that are interested in GWT are comparing it to the libraries listed below
Sorting:
- ☆33Updated 4 months ago
- ☆11Updated 4 months ago
- ☆9Updated 2 years ago
- Work in progress.☆70Updated 2 weeks ago
- ☆53Updated last year
- This repository contains code for the MicroAdam paper.☆20Updated 7 months ago
- ☆82Updated 10 months ago
- ☆81Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆68Updated 11 months ago
- The evaluation framework for training-free sparse attention in LLMs☆83Updated 3 weeks ago
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆17Updated 8 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆75Updated 8 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated 2 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆142Updated last month
- [ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation☆12Updated last year
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆32Updated 8 months ago
- ☆26Updated 8 months ago
- ☆16Updated last month
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- Mixture of A Million Experts☆46Updated 11 months ago
- Here we will test various linear attention designs.☆60Updated last year
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆30Updated 2 weeks ago
- Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"☆18Updated last month
- ☆37Updated 3 months ago
- ☆48Updated last year
- ☆53Updated 9 months ago
- ☆51Updated 8 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆74Updated 8 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆42Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆87Updated last year