allenai / staged-trainingLinks

Staged Training for Transformer Language Models

☆33

Alternatives and similar repositories for staged-training

Users that are interested in staged-training are comparing it to the libraries listed below

Sorting:

frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆59Updated 2 years ago
kernelmachine / demix
DEMix Layers for Modular Language Modeling
☆54Updated 4 years ago
ThomasScialom / T0_continual_learning
Adding new tasks to T0 without catastrophic forgetting
☆33Updated 3 years ago
da03 / criticize_text_generation
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆11Updated 2 years ago
jxhe / efficient-knnlm
Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)
☆74Updated 3 years ago
sunyt32 / torchscale
Transformers at any scale
☆41Updated last year
Shark-NLP / CAB
☆31Updated 2 years ago
RobertCsordas / ndr
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".
☆34Updated 5 months ago
suzgunmirac / crowd-sampling
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
☆18Updated 3 years ago
renll / SparseLT
[EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing
☆14Updated 2 years ago
MikeWangWZHL / Zemi
Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings
☆16Updated 2 years ago
jungokasai / twist_decoding
☆30Updated 3 years ago
ekinakyurek / influence
Code for "Tracing Knowledge in Language Models Back to the Training Data"
☆39Updated 2 years ago
microsoft / AMOS
[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators
☆25Updated 2 years ago
RUCAIBox / ELMER
This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficie…
☆26Updated 3 years ago
gmftbyGMFTBY / Rep-Dropout
[NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
☆37Updated 2 years ago
ghrua / NgramRes
☆22Updated 3 years ago
jungokasai / beam_with_patience
☆46Updated 3 years ago
tau-nlp / scrolls
The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".
☆69Updated last year
yxuansu / Contrastive_Search_versus_Contrastive_Decoding
An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation
☆27Updated last year
bigscience-workshop / architecture-objective
☆98Updated 2 years ago
McGill-NLP / polytropon
☆54Updated 2 years ago
jungokasai / deep-shallow
☆44Updated 5 years ago
INK-USC / ReCross
ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation
☆24Updated 3 years ago
princeton-nlp / LM-Kernel-FT
A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
☆78Updated 2 years ago
nyu-mll / SQuALITY
Query-focused summarization data
☆42Updated 2 years ago
thunlp / Knowledge-Inheritance
Source code for paper: Knowledge Inheritance for Pre-trained Language Models
☆38Updated 3 years ago
qqaatw / pytorch-realm-orqa
PyTorch reimplementation of REALM and ORQA
☆22Updated 3 years ago
joeljang / ELM
[ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning
☆98Updated 2 years ago
cindyxinyiwang / expand-via-lexicon-based-adaptation
Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"
☆30Updated 3 years ago