Staged Training for Transformer Language Models
☆33Mar 31, 2022Updated 3 years ago
Alternatives and similar repositories for staged-training
Users that are interested in staged-training are comparing it to the libraries listed below
Sorting:
- ☆16May 6, 2021Updated 4 years ago
- Temporal and Causal Relation extraction module for the Newsreader project.☆10Oct 26, 2015Updated 10 years ago
- decontamination☆26Dec 3, 2025Updated 3 months ago
- Implementation of Cascaded Head-colliding Attention (ACL'2021)☆11Sep 16, 2021Updated 4 years ago
- IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization☆12Nov 23, 2021Updated 4 years ago
- LTG-Bert☆34Jan 8, 2024Updated 2 years ago
- “Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition” (EMNLP 2022)☆16Feb 2, 2023Updated 3 years ago
- statnlp-neural☆32Sep 26, 2019Updated 6 years ago
- Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)☆15Sep 6, 2022Updated 3 years ago
- ☆13Feb 12, 2023Updated 3 years ago
- UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…☆31Dec 5, 2022Updated 3 years ago
- The implementation for our paper, "Improving Simultaneous Machine Translation with Monolingual Data," accepted to AAAI 2023. 🎉☆12Jul 19, 2023Updated 2 years ago
- ☆16May 14, 2024Updated last year
- GC4LM: A Colossal (Biased) language model for German☆13May 2, 2021Updated 4 years ago
- {DeepL, Google, WMT-Best, davinci-003, turbo, gpt-4} × {En-De, En-Cs, En-Ru, En-Zh, De-Fr, En-Ja, Uk-En, Uk-Cs, En-Hr, En-Ha, En-Is}☆14Jun 18, 2023Updated 2 years ago
- Getting interpretable dimensions in word embedding spaces.☆15Jul 6, 2023Updated 2 years ago
- Meta Representation Transformation for Low-resource Cross-lingual Learning☆41May 5, 2021Updated 4 years ago
- ☆14Jul 11, 2022Updated 3 years ago
- PyTorch Language Modeling Toolkit for Fast Weight Programmers☆19Jun 11, 2025Updated 8 months ago
- Memory-efficient transformer. Work in progress.☆19Sep 17, 2022Updated 3 years ago
- c++ mosestokenizer☆18Mar 13, 2024Updated last year
- Implementation of the paper 'Sentence Bottleneck Autoencoders from Transformer Language Models'☆17Mar 14, 2022Updated 3 years ago
- Group-conditional DRO to alleviate spurious correlations☆15Jul 15, 2021Updated 4 years ago
- Set-Equivariant Deep Learning Models☆22Dec 23, 2021Updated 4 years ago
- Named entity recognition for the legal domain☆43Jun 1, 2021Updated 4 years ago
- Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"☆16May 31, 2019Updated 6 years ago
- Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resourc…☆19Jan 12, 2023Updated 3 years ago
- Physarum Powered Differentiable Linear Programming Layers☆18Oct 27, 2021Updated 4 years ago
- TaskMet Task-driven Metric Learning for Model Learning☆20Feb 9, 2024Updated 2 years ago
- Temporary remove unused tokens during training to save ram and speed.☆23Jun 15, 2025Updated 8 months ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆22Feb 14, 2024Updated 2 years ago
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Jul 13, 2022Updated 3 years ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…☆26Feb 16, 2026Updated 2 weeks ago
- [NeurIPS 2022] "A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models", Yuanxin Liu, Fandong Meng, Zheng Lin, Jiangnan Li…☆21Jan 9, 2024Updated 2 years ago
- Codebase for running (conditional) probing experiments☆22Nov 13, 2022Updated 3 years ago
- Implementation of paper "Probabilistic Active Meta-Learning" (NeurIPS 2020).☆20Dec 2, 2020Updated 5 years ago
- UNLP 2025 Shared Task on Detecting Social Media Manipulation☆23Aug 4, 2025Updated 7 months ago
- ☆20Dec 16, 2020Updated 5 years ago
- Code for ACL 2023 paper titled "Lifting the Curse of Capacity Gap in Distilling Language Models"☆29Jul 14, 2023Updated 2 years ago