google-research/url-nlp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research/url-nlp)

google-research / url-nlp

☆273

Alternatives and similar repositories for url-nlp

Users that are interested in url-nlp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cisnlp / Glot500
View on GitHub
[ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
☆107Apr 14, 2026Updated 3 months ago
cisnlp / MEXA
View on GitHub
[ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
☆11Apr 6, 2025Updated last year
dadelani / sib-200
View on GitHub
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
☆26May 20, 2026Updated 2 months ago
MicrosoftTranslator / NTREX
View on GitHub
NTREX -- News Test References for MT Evaluation
☆87Jun 5, 2024Updated 2 years ago
Betswish / Cross-Lingual-Consistency
View on GitHub
Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper he…
☆28Aug 8, 2025Updated 11 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
gauthelo / kallaama-speech-dataset
View on GitHub
A transcribed speech dataset in Wolof, Pulaar and Sereer, to support agriculture. Funded by Lacuna Fund.
☆20Mar 26, 2026Updated 4 months ago
antonisa / lang2vec
View on GitHub
A simple library for querying the URIEL typological database.
☆97Apr 8, 2024Updated 2 years ago
ehsanasgari / 1000Langs
View on GitHub
Creating super-parallel corpora of more than 1500+ unique languages for NLP research
☆33Dec 8, 2022Updated 3 years ago
alexandra-chron / lexical_xlm_relm
View on GitHub
PyTorch source code of NAACL 2021 paper "Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Tran…
☆18Oct 18, 2022Updated 3 years ago
cindyxinyiwang / expand-via-lexicon-based-adaptation
View on GitHub
Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"
☆29Apr 2, 2022Updated 4 years ago
uds-lsv / afro-maft
View on GitHub
☆17Jan 12, 2023Updated 3 years ago
HKUNLP / multilingual-transfer
View on GitHub
Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“
☆15Jun 13, 2023Updated 3 years ago
zwhe99 / SelfTraining4UNMT
View on GitHub
[ACL 2022] Bridging the Data Gap between Training and Inference for Unsupervised Neural Machine Translation
☆31Oct 6, 2023Updated 2 years ago
swiss-ai / parity-aware-bpe
View on GitHub
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [ACL 2026]
☆20Apr 18, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
antonisa / unimorph_inflect
View on GitHub
A python library for easily querying morphological inflection models trained on Unimorph
☆13Oct 23, 2022Updated 3 years ago
google-research / nisaba
View on GitHub
Finite-state script normalization and processing utilities
☆52Updated this week
cisnlp / multypo
View on GitHub
A Multilingual Keyboard Layout-Based Typo Generator
☆17Nov 23, 2025Updated 8 months ago
dadelani / africanlp-resources
View on GitHub
List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyond
☆13Aug 15, 2022Updated 3 years ago
cisnlp / GlotWeb
View on GitHub
[WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages
☆17Apr 14, 2026Updated 3 months ago
ntunlp / mulda
View on GitHub
☆21Oct 26, 2021Updated 4 years ago
mhardalov / exams-qa
View on GitHub
A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
☆49Apr 5, 2022Updated 4 years ago
mainlp / germanic-lrl-corpora
View on GitHub
Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…
☆28Feb 16, 2026Updated 5 months ago
cisnlp / simalign
View on GitHub
[EMNLP 2020] Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
☆398Nov 7, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
cisnlp / GlotLID
View on GitHub
[EMNLP 2023] 💬 Language Identification with Support for More Than 2000 Labels
☆212Apr 15, 2026Updated 3 months ago
cisnlp / parcoure
View on GitHub
ParCourE - Parallel Corpus Explorer
☆12Dec 27, 2021Updated 4 years ago
cisnlp / ofa
View on GitHub
[NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
☆18Nov 26, 2023Updated 2 years ago
flairNLP / familiarity
View on GitHub
Label shift estimation for transfer difficulty with Familiarity.
☆10Feb 4, 2025Updated last year
prohandler / GS-Bulk-Emails
View on GitHub
Google App Scripts that sends a number of emails from the specific number and that tracks the open status of each email
☆17Dec 11, 2024Updated last year
laurieburchell / open-lid-dataset
View on GitHub
Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)
☆77Apr 1, 2025Updated last year
allenai / numglue
View on GitHub
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks
☆20May 10, 2022Updated 4 years ago
masakhane-io / lafand-mt
View on GitHub
MAFAND-MT
☆63Jul 9, 2024Updated 2 years ago
marian-nmt / sotastream
View on GitHub
A library for data streaming and augmentation
☆22May 5, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
nlp-uoregon / mlmm-evaluation
View on GitHub
Multilingual Large Language Models Evaluation Benchmark
☆134Aug 21, 2024Updated last year
SimengSun / ChapterBreak
View on GitHub
☆12Jun 5, 2024Updated 2 years ago
facebookresearch / belebele
View on GitHub
Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
☆341Dec 18, 2024Updated last year
ltgoslo / simple_elmo_training
View on GitHub
Minimal code to train ELMo models in recent versions of TensorFlow
☆14Jun 16, 2026Updated last month
christos-c / bible-corpus
View on GitHub
A multilingual parallel corpus created from translations of the Bible.
☆197May 19, 2025Updated last year
ymoslem / MT-Preparation
View on GitHub
Machine Translation (MT) Preparation Scripts
☆37May 25, 2025Updated last year
alpoktem / bible2speechDB
View on GitHub
Scripts to create speech corpora from open.bible
☆13Jan 3, 2022Updated 4 years ago