masakhane-io / masakhane-reading-group
Agile reading group that works
☆13Updated 2 years ago
Related projects: ⓘ
- A guide to building language technology in new languages.☆57Updated 2 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- Statistics on multilingual datasets☆17Updated 2 years ago
- ☆40Updated 2 years ago
- Dataset of ML and NLP papers☆35Updated 2 years ago
- A program to choose transfer languages for cross-lingual learning☆70Updated last year
- This repositary hosts my experiments for the project, I did with OffNote Labs.☆11Updated 3 years ago
- Code for extracting parallel corpora from pmindia☆16Updated 4 years ago
- Curated list of publicly available parallel corpus for Indian Languages☆30Updated 3 years ago
- NTREX -- News Test References for MT Evaluation☆73Updated 3 months ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆72Updated last year
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆32Updated last year
- ☆23Updated 4 years ago
- ☆13Updated 2 years ago
- Add noise to your text, can be used to improve synthetic training corpus for Neural Machine Translation☆39Updated 5 years ago
- A tiny BERT for low-resource monolingual models☆28Updated 4 months ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆46Updated 3 years ago
- This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The tex…☆49Updated 4 years ago
- Zero-shot Transfer Learning from English to Arabic☆29Updated 2 years ago
- ☆16Updated last year
- ☆17Updated 2 years ago
- A Benchmark Dataset for Understanding Disfluencies in Question Answering☆60Updated 3 years ago
- ☆91Updated 7 months ago
- A software for transferring pre-trained English models to foreign languages☆18Updated last year
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…☆80Updated 3 years ago
- Repository for the English-Hindi Codemixed to Monolingual English Parallel Corpus☆13Updated 5 years ago
- Arabic Dialect Identification on AOC data.☆23Updated 5 years ago
- Codebase for probing and visualizing multilingual models.☆45Updated 4 years ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago
- Assessing syntactic abilities of BERT☆39Updated 5 years ago