lucidrains / electra-pytorch
A simple and working implementation of Electra, the fastest way to pretrain language models from scratch, in Pytorch
☆221Updated last year
Alternatives and similar repositories for electra-pytorch:
Users that are interested in electra-pytorch are comparing it to the libraries listed below
- Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)☆326Updated last year
- Implementation of Feedback Transformer in Pytorch☆105Updated 3 years ago
- Implementation of the GBST block from the Charformer paper, in Pytorch☆117Updated 3 years ago
- An implementation of masked language modeling for Pytorch, made as concise and simple as possible☆178Updated last year
- Language Modeling Example with Transformers and PyTorch Lighting☆65Updated 4 years ago
- ☆213Updated 4 years ago
- Implementation of Mixout with PyTorch☆74Updated 2 years ago
- Code for the NAACL 2022 long paper "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings"☆293Updated 2 years ago
- Repository containing code for "How to Train BERT with an Academic Budget" paper☆311Updated last year
- On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines☆134Updated last year
- Understanding the Difficulty of Training Transformers☆328Updated 2 years ago
- Code for the Shortformer model, from the ACL 2021 paper by Ofir Press, Noah A. Smith and Mike Lewis.☆145Updated 3 years ago
- Pytorch implementation of Compressive Transformers, from Deepmind☆155Updated 3 years ago
- A 🤗-style implementation of BERT using lambda layers instead of self-attention☆69Updated 4 years ago
- Fully featured implementation of Routing Transformer☆288Updated 3 years ago
- Code for Multi-Head Attention: Collaborate Instead of Concatenate☆152Updated last year
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆361Updated 2 years ago
- Implementation of Marge, Pre-training via Paraphrasing, in Pytorch☆75Updated 4 years ago
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆70Updated last year
- Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…☆136Updated last year
- Sequence modeling with Mega.☆298Updated 2 years ago
- Trains Transformer model variants. Data isn't shuffled between batches.☆139Updated 2 years ago
- XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale☆154Updated last year
- Implementation of Self-adjusting Dice Loss from "Dice Loss for Data-imbalanced NLP Tasks" paper☆107Updated 4 years ago
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆226Updated 2 years ago
- Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning☆160Updated last year
- Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."☆134Updated 3 years ago
- PyTorch – SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models.☆61Updated 2 years ago
- Flexible components pairing 🤗 Transformers with Pytorch Lightning☆612Updated 2 years ago
- Minimalist implementation of a BERT Sentence Classifier with PyTorch Lightning, Transformers and PyTorch-NLP.☆216Updated last year