lucidrains / mlm-pytorchLinks

An implementation of masked language modeling for Pytorch, made as concise and simple as possible

☆179

Alternatives and similar repositories for mlm-pytorch

Users that are interested in mlm-pytorch are comparing it to the libraries listed below

Sorting:

lucidrains / electra-pytorch
A simple and working implementation of Electra, the fastest way to pretrain language models from scratch, in Pytorch
☆227Updated 2 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆297Updated 2 years ago
lucidrains / charformer-pytorch
Implementation of the GBST block from the Charformer paper, in Pytorch
☆118Updated 4 years ago
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆538Updated last year
lucidrains / compressive-transformer-pytorch
Pytorch implementation of Compressive Transformers, from Deepmind
☆163Updated 3 years ago
yang-zhang / lightning-language-modeling
Language Modeling Example with Transformers and PyTorch Lighting
☆65Updated 4 years ago
XuezheMax / fairseq-apollo
FairSeq repo with Apollo optimizer
☆114Updated last year
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆329Updated 3 years ago
SeanNaren / minGPT
A minimal PyTorch Lightning OpenAI GPT w DeepSpeed Training!
☆112Updated 2 years ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆204Updated last year
richarddwang / electra_pytorch
Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
☆330Updated last year
guolinke / TUPE
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve exis…
☆251Updated 3 years ago
google-research / bigbird
Transformers for Longer Sequences
☆618Updated 2 years ago
ChunyuanLI / Optimus
Optimus: the first large-scale pre-trained VAE language model
☆390Updated last year
facebookresearch / transformer-sequential
Trains Transformer model variants. Data isn't shuffled between batches.
☆143Updated 2 years ago
lucidrains / h-transformer-1d
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
☆163Updated last year
IntelLabs / academic-budget-bert
Repository containing code for "How to Train BERT with an Academic Budget" paper
☆314Updated last year
laiguokun / Funnel-Transformer
☆218Updated 5 years ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆229Updated 11 months ago
huanghonggit / Mask-Language-Model
pytorch； mask language model ； bert
☆72Updated 5 years ago
lucidrains / feedback-transformer-pytorch
Implementation of Feedback Transformer in Pytorch
☆107Updated 4 years ago
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆101Updated 2 years ago
Fraser-Greenlee / transformer-vae
A library for making Transformer Variational Autoencoders. (Extends the Huggingface/transformers library.)
☆142Updated 4 years ago
lucidrains / routing-transformer
Fully featured implementation of Routing Transformer
☆297Updated 3 years ago
Fraser-Greenlee / T5-VAE
Checkout the new version at the link!
☆22Updated 4 years ago
seongminp / transformers-into-vaes
Code for "Finetuning Pretrained Transformers into Variational Autoencoders"
☆39Updated 3 years ago
google / flaxformer
☆361Updated last year
zphang / minimal-opt
☆67Updated 2 years ago
naver / gdc
Code accompanying our papers on the "Generative Distributional Control" framework
☆118Updated 2 years ago
lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆470Updated 3 weeks ago