loshchil / AdamW-and-SGDWLinks

Decoupled Weight Decay Regularization (ICLR 2019)

☆282

Alternatives and similar repositories for AdamW-and-SGDW

Users that are interested in AdamW-and-SGDW are comparing it to the libraries listed below

Sorting:

loshchil / SGDR
☆252Updated 8 years ago
rahulkidambi / AccSGD
Implements pytorch code for the Accelerated SGD algorithm.
☆215Updated 7 years ago
michaelrzhang / lookahead
Implementation for the Lookahead Optimizer.
☆243Updated 3 years ago
pytorch / contrib
Implementations of ideas from recent papers
☆392Updated 4 years ago
idiap / importance-sampling
Code for experiments regarding importance sampling for training neural networks
☆329Updated 3 years ago
renmengye / revnet-public
Code for "The Reversible Residual Network: Backpropagation Without Storing Activations"
☆362Updated 7 years ago
prigoyal / pytorch_memonger
Experimental ground for optimizing memory of pytorch models
☆367Updated 7 years ago
t-vi / pytorch-tvmisc
Totally Versatile Miscellanea for Pytorch
☆475Updated 3 years ago
keskarnitish / large-batch-training
Code to reproduce some of the figures in the paper "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
☆145Updated 8 years ago
hongyi-zhang / Fixup
A Re-implementation of Fixed-update Initialization
☆155Updated 6 years ago
google / bi-tempered-loss
Robust Bi-Tempered Logistic Loss Based on Bregman Divergences. https://arxiv.org/pdf/1906.03361.pdf
☆147Updated 3 years ago
lonePatient / lookahead_pytorch
pytorch implement of Lookahead Optimizer
☆195Updated 3 years ago
Cohere-Labs-Community / Targeted-Dropout
Complementary code for the Targeted Dropout paper
☆254Updated 6 years ago
rdevon / cortex
A machine learning library for PyTorch
☆94Updated 3 years ago
csrhddlam / pytorch-checkpoint
☆165Updated 6 years ago
sgugger / Adam-experiments
Experiments with Adam/AdamW/amsgrad
☆201Updated 7 years ago
BayesWatch / sequential-imagenet-dataloader
A plug-in replacement for DataLoader to load Imagenet disk-sequentially in PyTorch.
☆239Updated 4 years ago
gaohuang / SnapshotEnsemble
Snapshot Ensembles in Torch (Snapshot Ensembles: Train 1, Get M for Free)
☆188Updated 8 years ago
uber-research / intrinsic-dimension
☆219Updated 7 years ago
gbaydin / hypergradient-descent
Hypergradient descent
☆148Updated last year
egg-west / AdamW-pytorch
Implementation and experiments for AdamW on Pytorch
☆94Updated 5 years ago
eladhoffer / bigBatch
Code used to generate the results appearing in "Train longer, generalize better: closing the generalization gap in large batch training o…
☆149Updated 8 years ago
seba-1511 / lstms.pth
PyTorch implementations of LSTM Variants (Dropout + Layer Norm)
☆137Updated 4 years ago
yaroslavvb / kfac_pytorch
☆133Updated 8 years ago
bayesgroup / variational-dropout-sparsifies-dnn
Sparse Variational Dropout, ICML 2017
☆313Updated 5 years ago
eladhoffer / utils.pytorch
Utilities for Pytorch
☆88Updated 3 years ago
xgastaldi / shake-shake
2.86% and 15.85% on CIFAR-10 and CIFAR-100
☆297Updated 7 years ago
pluskid / fitting-random-labels
Example code for the paper "Understanding deep learning requires rethinking generalization"
☆178Updated 5 years ago
mariogeiger / hessian
hessian in pytorch
☆187Updated 5 years ago
modestyachts / CIFAR-10.1
Release of CIFAR-10.1, a new test set for CIFAR-10.
☆224Updated 5 years ago