TalSchuster / CATs
Confident Adaptive Transformers
☆12Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for CATs
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Updated 2 years ago
- ☆31Updated 10 months ago
- Pretraining summarization models using a corpus of nonsense☆13Updated 3 years ago
- Variable-order CRFs with structure learning☆16Updated 3 months ago
- ☆42Updated 4 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆15Updated last year
- Source code for "A Lightweight Recurrent Network for Sequence Modeling"☆26Updated last year
- ☆12Updated 2 years ago
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paper☆14Updated 3 years ago
- Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)☆71Updated 2 years ago
- A study of the downstream instability of word embeddings☆12Updated 2 years ago
- Query-focused summarization data☆41Updated last year
- Influence Experiments☆35Updated last year
- This repo contains code to reproduce some of the results presented in the paper "SentenceMIM: A Latent Variable Language Model"☆28Updated 2 years ago
- Combining encoder-based language models☆11Updated 3 years ago
- Adding new tasks to T0 without catastrophic forgetting☆30Updated 2 years ago
- ☆21Updated 3 years ago
- ☆13Updated last year
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆59Updated 2 years ago
- lanmt ebm☆11Updated 4 years ago
- ☆12Updated 10 months ago
- ☆28Updated 2 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆44Updated last year
- ☆14Updated last month
- [ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators☆24Updated last year
- ☆13Updated 3 years ago
- ☆21Updated 2 years ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆35Updated 11 months ago