Mrpatekful / swatsLinks
Unofficial implementation of Switching from Adam to SGD optimization in PyTorch.
☆68Updated 2 years ago
Alternatives and similar repositories for swats
Users that are interested in swats are comparing it to the libraries listed below
Sorting:
- pytorch implement of Lookahead Optimizer☆195Updated 3 years ago
- Pytorch implementation of the hamburger module from the ICLR 2021 paper "Is Attention Better Than Matrix Decomposition"☆99Updated 4 years ago
- Implements https://arxiv.org/abs/1711.05101 AdamW optimizer, cosine learning rate scheduler and "Cyclical Learning Rates for Training Neu…☆153Updated 6 years ago
- Implementation and experiments for AdamW on Pytorch☆94Updated 6 years ago
- lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch☆337Updated 6 years ago
- This in my Demo of Chen et al. "GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks" ICML 2018☆181Updated 4 years ago
- Multi-Task Learning Framework on PyTorch. State-of-the-art methods are implemented to effectively train models on multiple tasks.☆149Updated 6 years ago
- A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).☆140Updated 5 years ago
- Mish Deep Learning Activation Function for PyTorch / FastAI☆161Updated 5 years ago
- pytorch implementation of basic kmeans algorithm(lloyd method with forgy initialization) with gpu support☆94Updated 7 years ago
- Pytorch implementation of Learning Rate Dropout.☆42Updated 6 years ago
- Official PyTorch Repo for "ReZero is All You Need: Fast Convergence at Large Depth"☆416Updated last year
- Robust Bi-Tempered Logistic Loss Based on Bregman Divergences. https://arxiv.org/pdf/1906.03361.pdf☆147Updated 3 years ago
- Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization☆182Updated 4 years ago
- Framework for creating (partially) reversible neural networks with PyTorch☆155Updated 3 years ago
- Simple package that makes your generator work in background thread☆282Updated 3 years ago
- Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)☆75Updated 5 years ago
- Code for Multi-Head Attention: Collaborate Instead of Concatenate☆152Updated 2 years ago
- Official implementation of Auxiliary Learning by Implicit Differentiation [ICLR 2021]☆86Updated last year
- Implementations of Recent Papers in Computer Vision☆38Updated 3 years ago
- Loss and accuracy go opposite ways...right?☆95Updated 5 years ago
- [ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845☆120Updated 4 years ago
- [ICML 2020] code for the flooding regularizer proposed in "Do We Need Zero Training Loss After Achieving Zero Training Error?"☆95Updated 2 years ago
- MTAdam: Automatic Balancing of Multiple Training Loss Terms☆36Updated 5 years ago
- A pytorch dataset sampler for always sampling balanced batches.☆118Updated 4 years ago
- Deep Learning project template for PyTorch (multi-gpu training is supported)☆138Updated 2 years ago
- AdaX: Adaptive Gradient Descent with Exponential Long Term Momery☆34Updated 5 years ago
- PyTorch Examples repo for "ReZero is All You Need: Fast Convergence at Large Depth"☆62Updated last year
- ☆148Updated 3 years ago
- Feature extraction made simple with torchextractor☆101Updated 4 years ago