hendrycks / GELUs
A smoother activation function (undergrad code)
☆108Updated 4 years ago
Alternatives and similar repositories for GELUs:
Users that are interested in GELUs are comparing it to the libraries listed below
- Sparse and structured neural attention mechanisms☆222Updated 4 years ago
- Implements pytorch code for the Accelerated SGD algorithm.☆215Updated 6 years ago
- LAnguage Modelling Benchmarks☆137Updated 4 years ago
- Adaptive Softmax implementation for PyTorch☆80Updated 5 years ago
- PyTorch implementation of Global Vectors for Word Representation.☆91Updated 6 years ago
- Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)☆101Updated 4 years ago
- PyTorch DataLoader for seq2seq☆84Updated 5 years ago
- Encoding position with the word embeddings.☆82Updated 6 years ago
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆89Updated 4 years ago
- Checking the interpretability of attention on text classification models☆47Updated 5 years ago
- ☆61Updated last year
- This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer …☆55Updated 4 years ago
- ☆213Updated 4 years ago
- ☆74Updated 7 years ago
- ☆64Updated 4 years ago
- Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation pr…☆45Updated 5 years ago
- Pytorch implementation of OpenAI-GPT for ROC stories☆51Updated 5 years ago
- a Pytorch implementation of the Reformer Network (https://openreview.net/pdf?id=rkgNKkHtvB)☆53Updated 2 years ago
- Code for EMNLP 2019 paper "Attention is not not Explanation"☆57Updated 3 years ago
- tunz's CUDA pytorch operator (MaskedSoftmax)☆75Updated 5 years ago
- Code for Sluice networks: Learning what to share between loosely related tasks☆151Updated 6 years ago
- Source code for "Efficient Training of BERT by Progressively Stacking"☆112Updated 5 years ago
- Embedding Quantization (Compress Word Embeddings)☆86Updated 5 years ago
- NeurIPS 2019 - Learning Data Manipulation for Augmentation and Weighting☆109Updated 4 years ago
- Code for EMNLP18 paper "Spherical Latent Spaces for Stable Variational Autoencoders"☆168Updated 6 years ago
- PyTorch Examples repo for "ReZero is All You Need: Fast Convergence at Large Depth"☆62Updated 6 months ago
- pytorch implement of Lookahead Optimizer☆189Updated 2 years ago
- Pytorch implementation of bytenet from "Neural Machine Translation in Linear Time" paper☆46Updated 7 years ago
- Factorization of the neural parameter space for zero-shot multi-lingual and multi-task transfer☆39Updated 4 years ago
- PyTorch implementation of the NIPS'17 paper Training Deep Networks without Learning Rates Through Coin Betting.☆37Updated 6 years ago