The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.
☆67Dec 16, 2022Updated 3 years ago
Alternatives and similar repositories for transformer_generalization
Users that are interested in transformer_generalization are comparing it to the libraries listed below
Sorting:
- Code for the paper: Saying No is An Art: Contextualized Fallback Responses for Unanswerable Dialogue Queries☆19Nov 29, 2021Updated 4 years ago
- ☆44Aug 2, 2021Updated 4 years ago
- Code for gradient rollback, which explains predictions of neural matrix factorization models, as for example used for knowledge base comp…☆21Mar 16, 2021Updated 4 years ago
- Code for reproducing the results from the paper Avoiding Side Effects in Complex Environments☆12Jun 3, 2021Updated 4 years ago
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch☆12Jan 16, 2022Updated 4 years ago
- ☆45Oct 11, 2021Updated 4 years ago
- The official repository for our paper "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks". We…☆46Oct 3, 2023Updated 2 years ago
- Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)☆102Nov 2, 2020Updated 5 years ago
- Generate multiple choice fill-in-the-blank questions from any article.☆13Dec 8, 2022Updated 3 years ago
- Uncertainty-Guided Pseudo-Labelling with Model Averaging☆11Sep 24, 2024Updated last year
- Tensorflow implementation of MuZero algorithm☆11Aug 23, 2022Updated 3 years ago
- Defeasible Natural Language Inference☆13Dec 4, 2020Updated 5 years ago
- 🖼️📊☆11Jun 9, 2020Updated 5 years ago
- Code and results accompanying our paper titled Leveraging Unlabeled Data to Predict Out-of-Distribution Performance at ICLR 2022☆10Dec 8, 2022Updated 3 years ago
- Official Pytorch Implementation for the paper 'SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients'☆17Jan 12, 2022Updated 4 years ago
- Official codebase for Pretrained Transformers as Universal Computation Engines.☆246Jan 14, 2022Updated 4 years ago
- ☆10Jul 27, 2018Updated 7 years ago
- The corresponding code from our paper "Social Commonsense Reasoning with Multi-Head Knowledge Attention (EMNLP 2020)". Do not hesitate to…☆11Jun 12, 2022Updated 3 years ago
- ☆11Jun 2, 2021Updated 4 years ago
- The Return of Lexical Dependencies: Neural Lexicalized PCFGs (TACL)☆33Sep 22, 2025Updated 5 months ago
- Code for AAAI 2024 paper: CR-SAM: Curvature Regularized Sharpness-Aware Minimization☆13Nov 29, 2024Updated last year
- code for the table-based open domain question answering project, with paper title: "Reasoning over Hybrid Chain for Table-and-Text Open D…☆12Sep 16, 2022Updated 3 years ago
- "An AGI architecture in vector space" Paper to be submitted to AGI-16☆14Nov 16, 2024Updated last year
- Debiasing Methods in Natural Language Understanding Make Bias More Accessible: Code and Data☆14Apr 24, 2022Updated 3 years ago
- Codebase implementing LMs for learning the Dyck-(k,m) bounded hierarchical language☆16Oct 11, 2020Updated 5 years ago
- NLP command-line assistant powered by OpenAI☆21Jan 27, 2024Updated 2 years ago
- The dataset and code for ACL 2022 paper "SciNLI: A Corpus for Natural Language Inference on Scientific Text" are released here.☆28Oct 17, 2023Updated 2 years ago
- Training vision models with full-batch gradient descent and regularization☆39Feb 14, 2023Updated 3 years ago
- ☆32Dec 10, 2020Updated 5 years ago
- Code Repo for "Differentiable Open-Ended Commonsense Reasoning" (NAACL 2021)☆32Jun 30, 2023Updated 2 years ago
- Code publication to the paper "Normalized Attention Without Probability Cage"☆17Nov 9, 2021Updated 4 years ago
- ☆18Jun 10, 2022Updated 3 years ago
- ☆21Jan 23, 2024Updated 2 years ago
- Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch☆15May 31, 2020Updated 5 years ago
- ICLR 2022 paper☆16May 6, 2022Updated 3 years ago
- ☆18Apr 19, 2024Updated last year
- ☆13Jul 25, 2024Updated last year
- Our implementation of Shampoo optimizer based on https://arxiv.org/pdf/1802.09568.pdf☆12Dec 23, 2019Updated 6 years ago
- An example docker container for runtime evaluation for the WIDER 2019 challenge track: face detection accuracy and runtime.☆17Aug 7, 2019Updated 6 years ago