CyndxAI / QKNormLinks
Code for the paper "Query-Key Normalization for Transformers"
☆41Updated 4 years ago
Alternatives and similar repositories for QKNorm
Users that are interested in QKNorm are comparing it to the libraries listed below
Sorting:
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Updated 2 years ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated 2 weeks ago
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated 2 years ago
- ☆33Updated 4 years ago
- Source code for ScaleGrad☆18Updated 3 years ago
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Updated 2 years ago
- ☆22Updated 3 years ago
- ☆97Updated last year
- ☆20Updated last year
- ☆16Updated 4 years ago
- Transformers at any scale☆41Updated last year
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆21Updated last year
- Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.☆28Updated 4 years ago
- Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch☆46Updated 4 years ago
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …☆33Updated 2 years ago
- [EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations☆31Updated 3 years ago
- ☆17Updated 2 years ago
- ☆13Updated 3 years ago
- Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning☆11Updated 6 months ago
- ☆37Updated 2 years ago
- PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)☆33Updated 2 years ago
- Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃☆114Updated 2 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Updated 2 years ago
- ☆16Updated 11 months ago
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 3 years ago
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆49Updated 3 years ago
- ☆22Updated 2 years ago
- [ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators☆24Updated last year
- ☆32Updated last year