IDSIA / lmtool-fwpLinks
PyTorch Language Modeling Toolkit for Fast Weight Programmers
☆19Updated 8 months ago
Alternatives and similar repositories for lmtool-fwp
Users that are interested in lmtool-fwp are comparing it to the libraries listed below
Sorting:
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".☆34Updated 8 months ago
- The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…☆67Updated 3 years ago
- ☆40Updated 4 years ago
- ☆44Updated 5 years ago
- Rationales for Sequential Predictions☆40Updated 3 years ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆63Updated 3 years ago
- LTG-Bert☆34Updated 2 years ago
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆59Updated 2 years ago
- ☆14Updated 4 years ago
- ☆22Updated 4 years ago
- Standalone pre-training recipe with JAX+Flax☆35Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Updated 3 years ago
- FairSeq repo with Apollo optimizer☆114Updated 2 years ago
- diagNNose is a Python library that facilitates a broad set of tools for analysing hidden activations of neural models.☆82Updated 2 years ago
- Blog post☆17Updated last year
- Amos optimizer with JEstimator lib.☆82Updated last year
- Evaluation pipeline for the BabyLM Challenge 2023.☆77Updated 2 years ago
- ☆22Updated 3 years ago
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆34Updated 5 years ago
- [NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".☆50Updated 2 years ago
- Suite of 500 procedurally-generated NLP tasks to study language model adaptability☆21Updated 3 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆49Updated 4 years ago
- Datasets for compositional learning☆11Updated 7 years ago
- This is a repository with the code for the EMNLP 2020 paper "Information-Theoretic Probing with Minimum Description Length"☆71Updated last year
- Code and data for the paper "Disentangling Uncertainty in Machine Translation Evaluation", accepted at EMNLP 2022.☆23Updated 2 years ago
- ☆16Updated last year
- Implementation of the GBST block from the Charformer paper, in Pytorch☆118Updated 4 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆59Updated 3 years ago
- Code of NAACL 2022 "Efficient Hierarchical Domain Adaptation for Pretrained Language Models" paper.☆32Updated 2 years ago
- ☆32Updated 2 years ago