RobertCsordas / transformer_generalizationView external linksLinks
The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.
β67Dec 16, 2022Updated 3 years ago
Alternatives and similar repositories for transformer_generalization
Users that are interested in transformer_generalization are comparing it to the libraries listed below
Sorting:
- π Evidence Retrieval and Claim Verification for the FEVER shared task using Transformer Networksβ12Feb 21, 2020Updated 5 years ago
- β44Aug 2, 2021Updated 4 years ago
- Code for gradient rollback, which explains predictions of neural matrix factorization models, as for example used for knowledge base compβ¦β21Mar 16, 2021Updated 4 years ago
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorchβ12Jan 16, 2022Updated 4 years ago
- Code for reproducing the results from the paper Avoiding Side Effects in Complex Environmentsβ12Jun 3, 2021Updated 4 years ago
- β45Oct 11, 2021Updated 4 years ago
- The official repository for our paper "Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks". Weβ¦β46Oct 3, 2023Updated 2 years ago
- Scale your ML workers asynchronously across processes and machinesβ13Apr 1, 2025Updated 10 months ago
- Uncertainty-Guided Pseudo-Labelling with Model Averagingβ11Sep 24, 2024Updated last year
- Generate multiple choice fill-in-the-blank questions from any article.β13Dec 8, 2022Updated 3 years ago
- Tensorflow implementation of MuZero algorithmβ11Aug 23, 2022Updated 3 years ago
- Code and results accompanying our paper titled Leveraging Unlabeled Data to Predict Out-of-Distribution Performance at ICLR 2022β10Dec 8, 2022Updated 3 years ago
- πΌοΈπβ11Jun 9, 2020Updated 5 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixingβ49Jan 27, 2022Updated 4 years ago
- XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scaleβ157Dec 20, 2023Updated 2 years ago
- Official codebase for Pretrained Transformers as Universal Computation Engines.β246Jan 14, 2022Updated 4 years ago
- Universal Python binding for the LMDB 'Lightning' Databaseβ13Nov 7, 2017Updated 8 years ago
- The Return of Lexical Dependencies: Neural Lexicalized PCFGs (TACL)β33Sep 22, 2025Updated 4 months ago
- β10Jul 27, 2018Updated 7 years ago
- β11Jun 2, 2021Updated 4 years ago
- The corresponding code from our paper "Social Commonsense Reasoning with Multi-Head Knowledge Attention (EMNLP 2020)". Do not hesitate toβ¦β11Jun 12, 2022Updated 3 years ago
- awesome unsupervised learning paper listβ12Jan 4, 2018Updated 8 years ago
- A neural text style transfer modelβ12Jun 23, 2019Updated 6 years ago
- COMIC: This is the code repo of our TMM2019 work titled "COMIC: Towards a Compact Image Captioning Model with Attention".β15Jun 22, 2021Updated 4 years ago
- Codebase implementing LMs for learning the Dyck-(k,m) bounded hierarchical languageβ16Oct 11, 2020Updated 5 years ago
- Debiasing Methods in Natural Language Understanding Make Bias More Accessible:Β Code and Dataβ14Apr 24, 2022Updated 3 years ago
- code for the table-based open domain question answering project, with paper title: "Reasoning over Hybrid Chain for Table-and-Text Open Dβ¦β12Sep 16, 2022Updated 3 years ago
- Training vision models with full-batch gradient descent and regularizationβ39Feb 14, 2023Updated 3 years ago
- The dataset and code for ACL 2022 paper "SciNLI: A Corpus for Natural Language Inference on Scientific Text" are released here.β28Oct 17, 2023Updated 2 years ago
- β32Dec 10, 2020Updated 5 years ago
- Code Repo for "Differentiable Open-Ended Commonsense Reasoning" (NAACL 2021)β32Jun 30, 2023Updated 2 years ago
- β18Jun 10, 2022Updated 3 years ago
- ICLR 2022 paperβ16May 6, 2022Updated 3 years ago
- An example docker container for runtime evaluation for the WIDER 2019 challenge track: face detection accuracy and runtime.β17Aug 7, 2019Updated 6 years ago
- Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorchβ15May 31, 2020Updated 5 years ago
- β16Mar 14, 2024Updated last year
- Paper: Lexicon Learning for Few-Shot Neural Sequence Modelingβ16Jan 8, 2022Updated 4 years ago
- β34Aug 30, 2021Updated 4 years ago
- Understanding Training Dynamics of Deep ReLU Networksβ306Oct 19, 2025Updated 3 months ago