pdufter / minimult
Analyzing mBERT's multilinguality in a small laboratory setting
☆13Updated last year
Alternatives and similar repositories for minimult:
Users that are interested in minimult are comparing it to the libraries listed below
- Pretraining scripts for BART transformer model☆11Updated last year
- ☆25Updated last year
- ☆24Updated 5 years ago
- Tool to perform paired evaluation of automatic systems☆12Updated 3 years ago
- Dependency Parsing as Sequence Labeling☆26Updated 7 months ago
- Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME)☆22Updated 4 years ago
- The Referential Reader: A Recurrent Entity Network for Anaphora Resolution, published at ACL 2019☆19Updated 5 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- Data and scripts for the proper evaluation of cross-lingual embeddings in multiple languages☆14Updated 4 years ago
- Diverse Natural Language Inference Collection - NLI dataset that can used to evaluate how well models perform distinct types of reasoning…☆36Updated 4 years ago
- ☆28Updated 9 months ago
- Improving cross-lingual word embeddings by meeting in the middle☆23Updated 4 years ago
- ☆20Updated 4 years ago
- Parsing only with Pretraining Networks☆16Updated 7 months ago
- A coreference evaluation package for the CoNLL and ARRAU datasets☆40Updated 4 years ago
- Python source code for EMNLP 2020 paper "Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT".☆35Updated 2 years ago
- EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering☆38Updated 3 years ago
- A program to choose transfer languages for cross-lingual learning☆72Updated last year
- PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"☆24Updated 3 years ago
- Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework☆52Updated 5 years ago
- This repository contains the code for applying One-Token Approximation to a pretrained language model using subword-level tokenization.☆11Updated 4 years ago
- ☆25Updated 2 years ago
- Alignment and annotation for comparable documents.☆22Updated 6 years ago
- UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them☆37Updated 2 years ago
- Statistics on multilingual datasets☆17Updated 2 years ago
- This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".☆79Updated 3 years ago
- ☆19Updated 4 years ago
- ☆32Updated 3 years ago
- This data release is meant to accompany and document the paper: https://arxiv.org/abs/2004.11997 Collecting Entailment Data for Pretrain…☆14Updated 4 years ago
- End-to-end shallow discourse parser☆20Updated last year