opooladz / Preconditioned-Stochastic-Gradient-DescentLinks

A repo based on XiLin Li's PSGD repo that extends some of the experiments.

☆14

Alternatives and similar repositories for Preconditioned-Stochastic-Gradient-Descent

Users that are interested in Preconditioned-Stochastic-Gradient-Descent are comparing it to the libraries listed below

Sorting:

apple / ml-ademamix
☆67Updated 11 months ago
ruke1ire / RTF
A State-Space Model with Rational Transfer Function Representation.
☆82Updated last year
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
shikaiqiu / compute-better-spent
☆58Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
JesseFarebro / flax-mup
Maximal Update Parametrization (μP) with Flax & Optax.
☆16Updated last year
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆86Updated 3 weeks ago
GallagherCommaJack / modulax
☆17Updated last year
ethansmith2000 / TransformerExperiments
☆19Updated 5 months ago
nikhilvyas / SOAP
☆218Updated 10 months ago
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆73Updated 2 weeks ago
google-deepmind / spectral_ssm
☆34Updated last year
cloneofsimo / zeroshampoo
☆34Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆102Updated 10 months ago
lindermanlab / elk
Scalable and Stable Parallelization of Nonlinear RNNS
☆23Updated this week
matthias-wright / jax-fid
FID computation in Jax/Flax.
☆28Updated last year
lucidrains / hl-gauss-pytorch
The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch
☆66Updated 2 months ago
kvfrans / splus
☆120Updated 4 months ago
lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆188Updated last week
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆37Updated 2 years ago
brianfitzgerald / jax-mmdit
Implementation of Diffusion Transformers and Rectified Flow in Jax
☆26Updated last year
SHI-Labs / CompactNet
☆32Updated last year
Z-T-WANG / LaProp-Optimizer
Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"
☆29Updated 5 years ago
ml-gde / jflux
JAX Implementation of Black Forest Labs' Flux.1 family of models
☆39Updated last month
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆88Updated last year
evanatyourservice / kron_torch
An implementation of PSGD Kron second-order optimizer for PyTorch
☆96Updated 3 months ago
radarFudan / mamba-minimal-jax
☆33Updated 11 months ago
LIONS-EPFL / scion
☆41Updated this week
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 9 months ago
cloneofsimo / ezmup
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆85Updated last year