kozistr / pytorch_optimizerLinks
optimizer & lr scheduler & loss function collections in PyTorch
☆368Updated this week
Alternatives and similar repositories for pytorch_optimizer
Users that are interested in pytorch_optimizer are comparing it to the libraries listed below
Sorting:
- A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to fac…☆242Updated last month
- D-Adaptation for SGD, Adam and AdaGrad☆526Updated 9 months ago
- Library for Jacobian descent with PyTorch. It enables the optimization of neural networks with multiple losses (e.g. multi-task learning)…☆271Updated last week
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆383Updated 2 years ago
- When it comes to optimizers, it's always better to be safe than sorry☆375Updated last month
- A simple way to keep track of an Exponential Moving Average (EMA) version of your Pytorch model☆617Updated 10 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆440Updated 5 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Updated last year
- Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"☆426Updated 10 months ago
- ☆220Updated 10 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆132Updated 2 weeks ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆366Updated last year
- Implementation of the proposed minGRU in Pytorch☆306Updated 7 months ago
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆123Updated last year
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆329Updated 10 months ago
- Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch☆252Updated 3 years ago
- ☆292Updated 10 months ago
- Implementation of Block Recurrent Transformer - Pytorch☆221Updated last year
- Implementation of Linformer for Pytorch☆300Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆320Updated 3 months ago
- Code release for "Dropout Reduces Underfitting"☆315Updated 2 years ago
- Efficient optimizers☆275Updated 2 weeks ago
- A repository for log-time feedforward networks☆222Updated last year
- A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…☆106Updated last year
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆110Updated 2 months ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆217Updated 2 years ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆769Updated 3 months ago
- Accelerated First Order Parallel Associative Scan☆189Updated last year
- Adaptive Gradient Clipping☆151Updated 3 years ago
- An implementation of local windowed attention for language modeling☆483Updated 3 months ago