google-research / url-nlpView external linksLinks
☆263Aug 1, 2025Updated 6 months ago
Alternatives and similar repositories for url-nlp
Users that are interested in url-nlp are comparing it to the libraries listed below
Sorting:
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆106Apr 20, 2024Updated last year
- NTREX -- News Test References for MT Evaluation☆88Jun 5, 2024Updated last year
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 10 months ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…☆27Updated this week
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyond☆12Aug 15, 2022Updated 3 years ago
- A transcribed speech dataset in Wolof, Pulaar and Sereer, to support agriculture. Funded by Lacuna Fund.☆18Apr 29, 2024Updated last year
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper he…☆27Aug 8, 2025Updated 6 months ago
- A simple library for querying the URIEL typological database.☆95Apr 8, 2024Updated last year
- Hosts text-to-speech corpus and speech synthesizers for African languages.☆18May 31, 2023Updated 2 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆386Nov 7, 2023Updated 2 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Dec 8, 2022Updated 3 years ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering☆46Apr 5, 2022Updated 3 years ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 4 months ago
- Crosslingual Question Answering for African Languages☆30Sep 27, 2024Updated last year
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Apr 2, 2022Updated 3 years ago
- NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks☆20May 10, 2022Updated 3 years ago
- Finite-state script normalization and processing utilities☆46Jan 14, 2026Updated 3 weeks ago
- ☆52Jun 6, 2023Updated 2 years ago
- ParCourE - Parallel Corpus Explorer☆12Dec 27, 2021Updated 4 years ago
- 🕸 GlotWeb: Web Indexing for Low-Resource Languages -- under construction.☆17Aug 13, 2025Updated 6 months ago
- ☆21Oct 26, 2021Updated 4 years ago
- PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"☆26Jun 2, 2021Updated 4 years ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆74Apr 1, 2025Updated 10 months ago
- Common Voice Generator using Speech Synthesizer☆13Jul 28, 2021Updated 4 years ago
- EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering☆38Nov 7, 2021Updated 4 years ago
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?☆57Apr 17, 2023Updated 2 years ago
- Scripts to create speech corpora from open.bible☆13Jan 3, 2022Updated 4 years ago
- Post-editing Datasets by Rakuten (PEDRa)☆14Jun 23, 2021Updated 4 years ago
- ☆19Sep 16, 2025Updated 4 months ago
- The FLORES+ Machine Translation Benchmark☆110Nov 12, 2024Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆133Aug 21, 2024Updated last year
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…☆295Feb 5, 2026Updated last week
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆64Feb 13, 2023Updated 3 years ago
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.☆340Dec 18, 2024Updated last year
- A library for data streaming and augmentation☆21May 5, 2025Updated 9 months ago
- Can LLMs generate code-mixed sentences through zero-shot prompting?☆11Apr 18, 2023Updated 2 years ago
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Apr 30, 2023Updated 2 years ago
- KIND: an Italian Multi-Domain Dataset for Named Entity Recognition☆15Jun 28, 2023Updated 2 years ago