CyndxAI / QKNorm
Code for the paper "Query-Key Normalization for Transformers"
☆36Updated 3 years ago
Alternatives and similar repositories for QKNorm:
Users that are interested in QKNorm are comparing it to the libraries listed below
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Updated 2 years ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆45Updated 3 years ago
- ☆18Updated 7 months ago
- ☆13Updated 2 years ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆16Updated last year
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆33Updated 4 years ago
- Source code repo for paper "TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation"☆10Updated last year
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆47Updated 2 years ago
- Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data☆56Updated 3 years ago
- This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"☆26Updated last year
- ☆29Updated 2 years ago
- ☆16Updated last year
- Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Tra…☆29Updated 3 years ago
- Exploring Few-Shot Adaptation of Language Models with Tables☆23Updated 2 years ago
- ☆22Updated 3 years ago
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".☆32Updated 3 years ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆20Updated last month
- Implementation of the retriever distillation procedure as outlined in the paper "Distilling Knowledge from Reader to Retriever"☆32Updated 4 years ago
- ☆13Updated 2 years ago
- A variant of Transformer-XL where the memory is updated not with a queue, but with attention☆47Updated 4 years ago
- Few-shot Learning with Auxiliary Data☆26Updated last year
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆60Updated 2 years ago
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated last year
- A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering☆40Updated 4 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆15Updated last year
- Staged Training for Transformer Language Models☆31Updated 2 years ago