Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation preconditioner and more)
☆190Jan 11, 2026Updated last month
Alternatives and similar repositories for psgd_torch
Users that are interested in psgd_torch are comparing it to the libraries listed below
Sorting:
- An implementation of PSGD Kron second-order optimizer for PyTorch☆98Jul 24, 2025Updated 7 months ago
- supporting pytorch FSDP for optimizers☆84Dec 8, 2024Updated last year
- Minimal Implimentation of VCRec (2024) for collapse provention.☆18Jan 28, 2025Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆18Jul 24, 2025Updated 7 months ago
- ☆252Dec 2, 2024Updated last year
- ☆19Dec 4, 2025Updated 3 months ago
- HomebrewNLP in JAX flavour for maintable TPU-Training☆51Jan 20, 2024Updated 2 years ago
- 🧱 Modula software package☆323Aug 18, 2025Updated 6 months ago
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Oct 7, 2024Updated last year
- PyTorch-SSO: Scalable Second-Order methods in PyTorch☆148Oct 1, 2023Updated 2 years ago
- Code for training on Imagenet to SOTA results using PyTorch☆13Aug 14, 2023Updated 2 years ago
- An implementation of shampoo☆78Mar 9, 2018Updated 7 years ago
- A collection of niche / personally useful PyTorch optimizers with modified code.☆27Oct 25, 2025Updated 4 months ago
- A dashboard for exploring timm learning rate schedulers☆19Nov 22, 2024Updated last year
- Repository containing Pytorch code for EKFAC and K-FAC perconditioners.☆153Jun 22, 2023Updated 2 years ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12May 24, 2023Updated 2 years ago
- Implementation of PSGD optimizer in JAX☆35Dec 31, 2024Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆362Nov 15, 2025Updated 3 months ago
- ASDL: Automatic Second-order Differentiation Library for PyTorch☆191Dec 5, 2024Updated last year
- Schedule-Free Optimization in PyTorch☆2,257May 21, 2025Updated 9 months ago
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- Collection of autoregressive model implementation☆85Feb 23, 2026Updated last week
- ☆28Oct 7, 2025Updated 4 months ago
- ☆93Jul 5, 2024Updated last year
- Tensorflow implementation of preconditioned stochastic gradient descent☆34Nov 23, 2023Updated 2 years ago
- For optimization algorithm research and development.☆557Updated this week
- Supporting code for the blog post on modular manifolds.☆117Sep 26, 2025Updated 5 months ago
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆33Jul 1, 2025Updated 8 months ago
- Focused on fast experimentation and simplicity☆80Dec 24, 2024Updated last year
- ☆13Jan 5, 2026Updated 2 months ago
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆15Feb 12, 2026Updated 3 weeks ago
- research impl of Native Sparse Attention (2502.11089)☆63Feb 19, 2025Updated last year
- ☆55Feb 24, 2026Updated last week
- ☆67Mar 21, 2025Updated 11 months ago
- TensorDict is a pytorch dedicated tensor container.☆1,009Updated this week
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆248Jun 6, 2025Updated 8 months ago
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago