yangalan123 / FineTuningStability

Code and data of the EMNLP 2022 paper "Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping""

☆12

Related projects: ⓘ

renll / SparseLT
[EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing
☆15Updated last year
princeton-nlp / align-mlm
☆11Updated last year
violet-zct / swarm-distillation-zero-shot
☆22Updated last year
ThomasScialom / T0_continual_learning
Adding new tasks to T0 without catastrophic forgetting
☆30Updated last year
INK-USC / ReCross
ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation
☆22Updated 2 years ago
suzgunmirac / crowd-sampling
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
☆18Updated last year
rabeehk / perfect
☆13Updated this week
ekinakyurek / influence
Influence Experiments
☆36Updated last year
ghrua / NgramRes
☆20Updated last year
cliang1453 / SAGE
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)
☆29Updated 2 years ago
frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆56Updated last year
JunjieHu / amber
Explicit Alignment Objectives for Multilingual Bidirectional Encoders
☆13Updated 3 years ago
RobertCsordas / ndr
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆32Updated 2 years ago
jungokasai / twist_decoding
☆28Updated 2 years ago
allenai / staged-training
Staged Training for Transformer Language Models
☆28Updated 2 years ago
da03 / criticize_text_generation
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆11Updated last year
Alibaba-NLP / MuVER
[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations
☆30Updated 2 years ago
ahmetustun / hyperx
☆19Updated last year
nathanhu0 / CaMeLS
Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.
☆23Updated 7 months ago
yuzhaouoe / pretraining-data-packing
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆16Updated last month
deep-spin / hallucinations-in-nmt
☆17Updated 8 months ago
MikeWangWZHL / Zemi
Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings
☆16Updated last year
sustcsonglin / TN-PCFG
source code of NAACL2021 "PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols“ and ACL2021 main conferenc…
☆44Updated 6 months ago
peterbhase / LAS-NL-Explanations
Code for paper "Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?"
☆20Updated 3 years ago
deep-spin / qaware-decode
A repository for experiments in quality-aware decoding
☆14Updated 2 years ago
wxjiao / WMT2022-Large-Scale-African
Introduction to "Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages".
☆13Updated last year
princeton-nlp / WhatICLLearns
[ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
☆21Updated last year
GChrysostomou / ood_faith
☆13Updated last year
asahi417 / lm-vocab-trimmer
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…
☆29Updated last month
cindyxinyiwang / expand-via-lexicon-based-adaptation
Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"
☆30Updated 2 years ago