ymcui / LAMB_Optimizer_TFLinks

LAMB Optimizer for Large Batch Training (TensorFlow version)

☆121

Alternatives and similar repositories for LAMB_Optimizer_TF

Users that are interested in LAMB_Optimizer_TF are comparing it to the libraries listed below

Sorting:

YerevaNN / DIIN-in-Keras
Reproducing Densely Interactive Inference Network in Keras
☆75Updated 7 years ago
LiyuanLucasLiu / LD-Net
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
☆147Updated 5 years ago
rdspring1 / PyTorch_GBW_LM
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset
☆123Updated 6 years ago
yangsaiyong / tf-adaptive-softmax-lstm-lm
The experiment result of LSTM language models on PTB (Penn Treebank) and GBW (Google Billion Word) using AdaptiveSoftmax on TensorFlow.
☆100Updated 6 years ago
bzhangGo / transformer-aan
souce code for "Accelerating Neural Transformer via an Average Attention Network"
☆78Updated 6 years ago
gonglinyuan / StackingBERT
Source code for "Efficient Training of BERT by Progressively Stacking"
☆113Updated 6 years ago
roomylee / self-attentive-emb-tf
Simple Tensorflow Implementation of "A Structured Self-attentive Sentence Embedding" (ICLR 2017)
☆91Updated 7 years ago
stanfordmlgroup / nlm-noising
☆74Updated 8 years ago
lambdal / bert
TensorFlow code and pre-trained models for BERT
☆116Updated 5 years ago
CyberZHG / keras-transformer-xl
Transformer-XL with checkpoint loader
☆68Updated 3 years ago
Edy-Barraza / Transformer_Distillation
Knowledge Distillation For Transformer Language Models
☆52Updated last year
taoshen58 / BiBloSA
Bi-Directional Block Self-Attention
☆122Updated 7 years ago
salesforce / nonauto-nmt
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"
☆271Updated 3 years ago
whr94621 / NJUNMT-pytorch
☆93Updated 4 years ago
haoyuhu / bert-multi-gpu
Feel free to fine tune large BERT models with Multi-GPU and FP16 support.
☆192Updated 5 years ago
intersun / PKD-for-BERT-Model-Compression
pytorch implementation for Patient Knowledge Distillation for BERT Model Compression
☆203Updated 6 years ago
guillaumegenthial / tf_metrics
Multi-class metrics for Tensorflow
☆223Updated 3 years ago
CLUEbenchmark / LightLM
高性能小模型测评 Shared Tasks in NLPCC 2020. Task 1 - Light Pre-Training Chinese Language Model for NLP Task
☆60Updated 5 years ago
harvardnlp / var-attn
Latent Alignment and Variational Attention
☆327Updated 6 years ago
yinwenpeng / Attentive_Convolution
This released code corresponds to TACL paper "attentive convolution". Attentive Convolution aims to generate a vector for two sentences.
☆105Updated 7 years ago
titu1994 / keras-LAMB-Optimizer
Implementation of the LAMB optimizer for Keras from the paper "Reducing BERT Pre-Training Time from 3 Days to 76 Minutes"
☆75Updated 6 years ago
matthew-z / R-net
R-net in PyTorch, with ELMo
☆198Updated 5 years ago
kevinkwl / AoAReader
PyTorch implementation of Attention-over-Attention Neural Networks for Reading Comprehension
☆59Updated 8 years ago
JetRunner / BERT-of-Theseus
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).
☆315Updated 2 years ago
ZNLP / sb-nmt
Code for Synchronous Bidirectional Neural Machine Translation (SB-NMT)
☆66Updated 6 years ago
jcyk / BERT
a simple yet complete implementation of the popular BERT model
☆128Updated 5 years ago
text-representation / local-context-unit
local-context-unit
☆56Updated 7 years ago
cerebroai / reformers
Efficient Transformers for research, PyTorch and Tensorflow using Locality Sensitive Hashing
☆95Updated 5 years ago
lancopku / Prime
A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
☆85Updated 2 years ago
liqunhit / NeuralClassifier
An Open-source Neural Hierarchical Multi-label Text Classification Toolkit
☆78Updated 6 years ago