renatoviolin / Switch-Transformers-in-Seq2Seq
☆23Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for Switch-Transformers-in-Seq2Seq
- NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings☆53Updated 5 months ago
- The code of paper "Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation" published at NeurIPS 202…☆42Updated 2 years ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆29Updated last week
- Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Lo…☆39Updated 10 months ago
- No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)☆29Updated 2 years ago
- ☆31Updated 7 months ago
- Staged Training for Transformer Language Models☆30Updated 2 years ago
- This is the official implementation of the paper: "Contrastive Learning of Sentence Embeddings from Scratch"☆36Updated last year
- Implementation code for the paper "Meta-learning via Language Model In-context Tuning" (ACL 2022)☆21Updated 2 years ago
- [EMNLP 2023] ALCUNA: Large Language Models Meet New Knowledge☆25Updated last year
- Code for ACL paper "Zero-Shot Text Classification via Self-Supervised Tuning"☆23Updated last year
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- [NeurIPS 2022] Generating Training Data with Language Models: Towards Zero-Shot Language Understanding☆62Updated 2 years ago
- PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialog…☆27Updated 3 years ago
- This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficie…☆26Updated 2 years ago
- Code for ACL 2023 paper titled "Lifting the Curse of Capacity Gap in Distilling Language Models"☆28Updated last year
- ☆10Updated 3 years ago
- ☆80Updated 2 years ago
- ☆18Updated 3 months ago
- The original Backpack Language Model implementation, a fork of FlashAttention☆64Updated last year
- The official code of our paper at EMNLP 2022: Back to the Future: Bidirectional Information Decoupling Network for Multi-turn Dialogue Mo…☆15Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆30Updated 2 years ago
- Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.☆51Updated 2 years ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆47Updated 4 months ago
- Source code for <Sequence-Level Training for Non-Autoregressive Neural Machine Translation>.☆23Updated 2 years ago
- Source code for paper: Knowledge Inheritance for Pre-trained Language Models☆38Updated 2 years ago
- ☆26Updated 8 months ago
- ☆14Updated last year
- This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Pr…☆24Updated 2 years ago
- ☆13Updated 2 years ago