lukemelas / simple-bert
A simple PyTorch implementation of BERT, complete with pretrained models and training scripts.
☆41Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for simple-bert
- Implementation of Kronecker Attention in Pytorch☆17Updated 4 years ago
- An open source implementation of CLIP.☆32Updated 2 years ago
- Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch☆59Updated 3 years ago
- A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering☆40Updated 4 years ago
- GPT, but made only out of MLPs☆87Updated 3 years ago
- Official PyTorch implementation of RIO☆18Updated 3 years ago
- MTAdam: Automatic Balancing of Multiple Training Loss Terms☆36Updated 4 years ago
- ☆24Updated 3 years ago
- Large dataset storage format for Pytorch☆45Updated 3 years ago
- ☆12Updated 3 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- Implementation of the Remixer Block from the Remixer paper, in Pytorch☆35Updated 3 years ago
- [NeurIPS 2022] DataMUX: Data Multiplexing for Neural Networks☆59Updated last year
- Code for scaling Transformers☆26Updated 3 years ago
- ☆34Updated 5 years ago
- This dataset contains about 110k images annotated with the depth and occlusion relationships between arbitrary objects. It enables resear…☆16Updated 3 years ago
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆49Updated 2 years ago
- Re-implementation of local descriptor HardNet training in fasta2+kornia☆21Updated 4 years ago
- ☆21Updated 3 years ago
- Code for "MIM: Mutual Information Machine" paper.☆16Updated 2 years ago
- Code for Multi-Head Attention: Collaborate Instead of Concatenate☆151Updated last year
- ☆32Updated 5 years ago
- Deep Unsupervised Similarity Learning using Partially Ordered Sets (CVPR17)☆20Updated 3 years ago
- ☆24Updated 6 months ago
- The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…☆66Updated last year
- Tensorflow 2.x implementation of Gradient Origin Networks☆13Updated 4 years ago
- ☆25Updated 3 years ago
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago