hendrycks / GELUsLinks

A smoother activation function (undergrad code)

☆115

Alternatives and similar repositories for GELUs

Users that are interested in GELUs are comparing it to the libraries listed below

Sorting:

rosinality / adaptive-softmax-pytorch
Adaptive Softmax implementation for PyTorch
☆81Updated 6 years ago
vene / sparse-structured-attention
Sparse and structured neural attention mechanisms
☆224Updated 5 years ago
google-deepmind / lamb
LAnguage Modelling Benchmarks
☆138Updated 5 years ago
seba-1511 / lstms.pth
PyTorch implementations of LSTM Variants (Dropout + Layer Norm)
☆137Updated 4 years ago
stefbraun / rnn_benchmarks
RNN benchmarks of pytorch, tensorflow and theano
☆89Updated 7 years ago
titu1994 / keras-LAMB-Optimizer
Implementation of the LAMB optimizer for Keras from the paper "Reducing BERT Pre-Training Time from 3 Days to 76 Minutes"
☆75Updated 6 years ago
epfml / collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
☆151Updated 2 years ago
tbachlechner / ReZero-examples
PyTorch Examples repo for "ReZero is All You Need: Fast Convergence at Large Depth"
☆62Updated last year
uber-research / loss-change-allocation
☆61Updated 2 years ago
jojonki / Gated-Convolutional-Networks
A PyTorch implementation of : Language Modeling with Gated Convolutional Networks.
☆102Updated 3 years ago
cybertronai / transformer-xl
Training Transformer-XL on 128 GPUs
☆141Updated 5 years ago
cerebroai / reformers
Efficient Transformers for research, PyTorch and Tensorflow using Locality Sensitive Hashing
☆95Updated 5 years ago
tanyuqian / learning-data-manipulation
NeurIPS 2019 - Learning Data Manipulation for Augmentation and Weighting
☆110Updated 5 years ago
demelin / Noise-Contrastive-Estimation-NCE-for-pyTorch
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation pr…
☆44Updated 6 years ago
anandsaha / nips.cocob.pytorch
PyTorch implementation of the NIPS'17 paper Training Deep Networks without Learning Rates Through Coin Betting.
☆37Updated 7 years ago
khakhulin / compressed-transformer
Compression of NMT transformer model with tensor methods
☆47Updated 6 years ago
ofirpress / PartialShuffle
☆14Updated 6 years ago
AndreasMadsen / stable-nalu
Code for Neural Arithmetic Units (ICLR) and Measuring Arithmetic Extrapolation Performance (SEDL|NeurIPS)
☆145Updated 4 years ago
CyberZHG / keras-adaptive-softmax
Adaptive embedding and softmax
☆17Updated 3 years ago
mrahtz / humble-gumbel
Jupyter notebook on Gumbel-max and Gumbel-softmax tricks
☆41Updated 2 years ago
archsyscall / aquvitae
Knowledge Distillation Toolkit
☆88Updated 5 years ago
phohenecker / pytorch-transformer
A PyTorch implementation of the Transformer model from "Attention Is All You Need".
☆59Updated 6 years ago
rdspring1 / PyTorch_GBW_LM
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset
☆123Updated 6 years ago
ymcui / LAMB_Optimizer_TF
LAMB Optimizer for Large Batch Training (TensorFlow version)
☆121Updated 5 years ago
loshchil / AdamW-and-SGDW
Decoupled Weight Decay Regularization (ICLR 2019)
☆281Updated 6 years ago
leimao / Two-Layer-Hierarchical-Softmax-PyTorch
Two-Layer Hierarchical Softmax Implementation for PyTorch
☆70Updated 4 years ago
contentinnovation / NeurIPS-2018-papers
Machine-generated summaries and highlights of the every accepted paper at Thirty-second Conference on Neural Information Processing Syste…
☆71Updated 6 years ago
2014mchidamb / TorchGlove
PyTorch implementation of Global Vectors for Word Representation.
☆92Updated 7 years ago
kondiz / fraternal-dropout
☆63Updated 7 years ago
p-null / openai-gpt-pytorch
Pytorch implementation of OpenAI-GPT for ROC stories
☆51Updated 6 years ago