yangalan123 / FineTuningStability
Code and data of the EMNLP 2022 paper "Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping""
☆12Updated last year
Related projects: ⓘ
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆15Updated last year
- ☆11Updated last year
- ☆22Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆30Updated last year
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆22Updated 2 years ago
- Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding☆18Updated last year
- ☆13Updated this week
- Influence Experiments☆36Updated last year
- ☆20Updated last year
- No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)☆29Updated 2 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated last year
- Explicit Alignment Objectives for Multilingual Bidirectional Encoders☆13Updated 3 years ago
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".☆32Updated 2 years ago
- ☆28Updated 2 years ago
- Staged Training for Transformer Language Models☆28Updated 2 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Updated last year
- [EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations☆30Updated 2 years ago
- ☆19Updated last year
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆23Updated 7 months ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆16Updated last month
- ☆17Updated 8 months ago
- Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings☆16Updated last year
- source code of NAACL2021 "PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols“ and ACL2021 main conferenc…☆44Updated 6 months ago
- Code for paper "Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?"☆20Updated 3 years ago
- A repository for experiments in quality-aware decoding☆14Updated 2 years ago
- Introduction to "Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages".☆13Updated last year
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆21Updated last year
- ☆13Updated last year
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆29Updated last month
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 2 years ago