majumderb / rezeroLinks

Official PyTorch Repo for "ReZero is All You Need: Fast Convergence at Large Depth"

☆416

Alternatives and similar repositories for rezero

Users that are interested in rezero are comparing it to the libraries listed below

Sorting:

alphadl / lookahead.pytorch
lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch
☆337Updated 6 years ago
cybertronai / pytorch-lamb
Implementation of https://arxiv.org/abs/1904.00962
☆377Updated 4 years ago
lonePatient / lookahead_pytorch
pytorch implement of Lookahead Optimizer
☆195Updated 3 years ago
mit-han-lab / lite-transformer
[ICLR 2020] Lite Transformer with Long-Short Range Attention
☆610Updated last year
sacmehta / delight
DeLighT: Very Deep and Light-Weight Transformers
☆468Updated 5 years ago
XuezheMax / apollo
Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
☆182Updated 4 years ago
PhilJd / contiguous_pytorch_params
Accelerate training by storing parameters in one contiguous chunk of memory.
☆293Updated 5 years ago
NVIDIA / runx
Deep Learning Experiment Management
☆642Updated 2 years ago
Yonghongwei / Gradient-Centralization
A New Optimization Technique for Deep Neural Networks
☆541Updated 3 years ago
tatp22 / linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
☆421Updated 3 years ago
lucidrains / sinkhorn-transformer
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention
☆269Updated 4 years ago
yangkky / distributed_tutorial
☆261Updated 6 years ago
prigoyal / pytorch_memonger
Experimental ground for optimizing memory of pytorch models
☆366Updated 7 years ago
zasdfgbnm / TorchSnooper
Debug PyTorch code using PySnooper
☆802Updated 4 years ago
lessw2020 / Ranger-Deep-Learning-Optimizer
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase
☆1,208Updated last year
facebookresearch / adaptive-span
Transformer training code for sequential tasks
☆610Updated 4 years ago
shaohua0116 / ICLR2020-OpenReviewData
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
☆462Updated 5 years ago
Lyken17 / pytorch-memonger
Sublinear memory optimization for deep learning. https://arxiv.org/abs/1604.06174
☆605Updated 5 years ago
justheuristic / prefetch_generator
Simple package that makes your generator work in background thread
☆282Updated 3 years ago
Stonesjtu / Pytorch-NCE
The Noise Contrastive Estimation for softmax output written in Pytorch
☆319Updated 6 years ago
lucidrains / routing-transformer
Fully featured implementation of Routing Transformer
☆298Updated 4 years ago
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆332Updated 3 years ago
guolinke / TUPE
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve exis…
☆253Updated 4 years ago
twistedcubic / attention-rank-collapse
[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
☆168Updated 4 years ago
pytorch / contrib
Implementations of ideas from recent papers
☆392Updated 4 years ago
mpyrozhok / adamwr
Implements https://arxiv.org/abs/1711.05101 AdamW optimizer, cosine learning rate scheduler and "Cyclical Learning Rates for Training Neu…
☆153Updated 6 years ago
michaelrzhang / lookahead
Implementation for the Lookahead Optimizer.
☆242Updated 3 years ago
egg-west / AdamW-pytorch
Implementation and experiments for AdamW on Pytorch
☆94Updated 6 years ago
nmhkahn / torchsummaryX
torchsummaryX: Improved visualization tool of torchsummary
☆303Updated 3 years ago
Santosh-Gupta / SpeedTorch
Library for faster pinned CPU <-> GPU transfer in Pytorch
☆683Updated 5 years ago