kozistr / pytorch_optimizerLinks

optimizer & lr scheduler & loss function collections in PyTorch

☆368

Alternatives and similar repositories for pytorch_optimizer

Users that are interested in pytorch_optimizer are comparing it to the libraries listed below

Sorting:

meta-pytorch / torcheval
A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to fac…
☆242Updated last month
facebookresearch / dadaptation
D-Adaptation for SGD, Adam and AdaGrad
☆526Updated 9 months ago
TorchJD / torchjd
Library for Jacobian descent with PyTorch. It enables the optimization of neural networks with multiple losses (e.g. multi-task learning)…
☆271Updated last week
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆383Updated 2 years ago
kyleliang919 / C-Optim
When it comes to optimizers, it's always better to be safe than sorry
☆375Updated last month
lucidrains / ema-pytorch
A simple way to keep track of an Exponential Moving Average (EMA) version of your Pytorch model
☆617Updated 10 months ago
zyushun / Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆440Updated 5 months ago
nanowell / AdEMAMix-Optimizer-Pytorch
The AdEMAMix Optimizer: Better, Faster, Older.
☆186Updated last year
iShohei220 / adopt
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"
☆426Updated 10 months ago
nikhilvyas / SOAP
☆220Updated 10 months ago
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆132Updated 2 weeks ago
lucidrains / st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆366Updated last year
lucidrains / minGRU-pytorch
Implementation of the proposed minGRU in Pytorch
☆306Updated 7 months ago
lucidrains / pytorch-custom-utils
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆123Updated last year
HazyResearch / flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
☆329Updated 10 months ago
lucidrains / Adan-pytorch
Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch
☆252Updated 3 years ago
bobby-he / simplified_transformers
☆292Updated 10 months ago
lucidrains / block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
☆221Updated last year
lucidrains / linformer
Implementation of Linformer for Pytorch
☆300Updated last year
KellerJordan / cifar10-airbench
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
☆320Updated 3 months ago
facebookresearch / dropout
Code release for "Dropout Reduces Underfitting"
☆315Updated 2 years ago
HomebrewML / HeavyBall
Efficient optimizers
☆275Updated 2 weeks ago
pbelcak / fastfeedforward
A repository for log-time feedforward networks
☆222Updated last year
fkodom / yet-another-retnet
A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…
☆106Updated last year
lucidrains / gradnorm-pytorch
A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch
☆110Updated 2 months ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆217Updated 2 years ago
lucidrains / rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
☆769Updated 3 months ago
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆189Updated last year
pseeth / autoclip
Adaptive Gradient Clipping
☆151Updated 3 years ago
lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆483Updated 3 months ago