zqOuO / GWTLinks

☆13

Alternatives and similar repositories for GWT

Users that are interested in GWT are comparing it to the libraries listed below

Sorting:

TianjinYellow / SPAM-Optimizer
☆35Updated 8 months ago
IST-DASLab / QuEST
Work in progress.
☆75Updated 4 months ago
IST-DASLab / MicroAdam
This repository contains code for the MicroAdam paper.
☆20Updated 11 months ago
LIONS-EPFL / scion
☆47Updated last month
deep-spin / adasplash
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
☆29Updated last month
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆103Updated last month
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated 3 weeks ago
selfsupervised-ai / Natural-GaLore
An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace
☆18Updated last year
kyleliang919 / Online-Subspace-Descent
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆30Updated 4 months ago
RobertCsordas / moeut
☆88Updated last year
andyjm3 / SLTrain
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)
☆36Updated last year
dayal-kalra / low-memory-adam
☆13Updated 8 months ago
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆17Updated last year
insuhan / hyper-attn
☆83Updated last year
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆206Updated 5 months ago
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆44Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
samsja / muon_fsdp_2
Muon fsdp 2
☆45Updated 3 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
berlino / seq_icl
☆53Updated last year
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆102Updated last month
kyleliang919 / Super_Muon
☆65Updated 8 months ago
Leiay / looped_transformer
☆33Updated last year
ScalingIntelligence / CATS
☆31Updated last year
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆35Updated 8 months ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆46Updated last year
nasosger / MuToR
[NeurIPS '25] Multi-Token Prediction Needs Registers
☆24Updated 2 months ago
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆36Updated last year