katsura-jp / pytorch-cosine-annealing-with-warmupLinks

☆466

Alternatives and similar repositories for pytorch-cosine-annealing-with-warmup

Users that are interested in pytorch-cosine-annealing-with-warmup are comparing it to the libraries listed below

Sorting:

Tony-Y / pytorch_warmup
Learning Rate Warmup in PyTorch
☆413Updated 4 months ago
fadel / pytorch_ema
Tiny PyTorch library for maintaining a moving average of a collection of parameters.
☆438Updated last year
xxxnell / how-do-vits-work
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
☆819Updated 3 years ago
kekmodel / MPL-pytorch
Unofficial PyTorch implementation of "Meta Pseudo Labels"
☆389Updated last year
ildoonet / pytorch-gradual-warmup-lr
Gradually-Warmup Learning Rate Scheduler for PyTorch
☆991Updated last year
davda54 / sam
SAM: Sharpness-Aware Minimization (PyTorch)
☆1,925Updated last year
lucidrains / mlp-mixer-pytorch
An All-MLP solution for Vision, from Google AI
☆1,048Updated 3 months ago
chinhsuanwu / coatnet-pytorch
A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes"
☆391Updated 4 years ago
SHI-Labs / Compact-Transformers
Escaping the Big Data Paradigm with Compact Transformers, 2021 (Train your Vision Transformers in 30 mins on CIFAR-10 with a single GPU!)
☆536Updated 11 months ago
locuslab / convmixer
Implementation of ConvMixer for "Patches Are All You Need? 🤷"
☆1,077Updated 2 years ago
lucidrains / ema-pytorch
A simple way to keep track of an Exponential Moving Average (EMA) version of your Pytorch model
☆617Updated 10 months ago
vballoli / nfnets-pytorch
NFNets and Adaptive Gradient Clipping for SGD implemented in PyTorch. Find explanation at tourdeml.github.io/blog/
☆349Updated last year
kakaobrain / torchlars
A LARS implementation in PyTorch
☆352Updated 5 years ago
aanna0701 / SPT_LSA_ViT
Implementation of Visual Transformer for Small-size Datasets
☆126Updated 3 years ago
facebookresearch / msn
Masked Siamese Networks for Label-Efficient Learning (https://arxiv.org/abs/2204.07141)
☆460Updated 3 years ago
AdeelH / pytorch-multi-class-focal-loss
An (unofficial) implementation of Focal Loss, as described in the RetinaNet paper, generalized to the multi-class case.
☆238Updated last year
facebookresearch / convit
Code for the Convolutional Vision Transformer (ConViT)
☆469Updated 4 years ago
tatp22 / multidim-positional-encoding
An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow
☆608Updated last year
sail-sg / Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
☆799Updated 4 months ago
microsoft / esvit
EsViT: Efficient self-supervised Vision Transformers
☆412Updated 2 years ago
sooftware / pytorch-lr-scheduler
PyTorch implementation of some learning rate schedulers for deep learning researcher.
☆91Updated 3 years ago
clovaai / AdamP
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights (ICLR 2021)
☆414Updated 4 years ago
ildoonet / pytorch-randaugment
Unofficial PyTorch Reimplementation of RandAugment.
☆636Updated 2 years ago
rishikksh20 / MLP-Mixer-pytorch
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision
☆218Updated 4 years ago
google-research / sam
☆605Updated 2 months ago
lucidrains / linear-attention-transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
☆801Updated last year
lessw2020 / Ranger21
Ranger deep learning optimizer rewrite to use newest components
☆338Updated last year
lucidrains / axial-attention
Implementation of Axial attention - attending to multi-dimensional data efficiently
☆386Updated 4 years ago
qubvel / ttach
Image Test Time Augmentation with PyTorch!
☆1,024Updated 2 years ago
SwinTransformer / Transformer-SSL
This is an official implementation for "Self-Supervised Learning with Swin Transformers".
☆663Updated 4 years ago