cisnlp/Glot500

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cisnlp/Glot500)

cisnlp / Glot500

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023

☆106

Alternatives and similar repositories for Glot500

Users that are interested in Glot500 are comparing it to the libraries listed below

Sorting:

cisnlp / mPLM-Sim
View on GitHub
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
☆11Jan 19, 2024Updated 2 years ago
cisnlp / GlotWeb
View on GitHub
🕸 GlotWeb: Web Indexing for Minority Languages (WWW 2026)
☆17Feb 27, 2026Updated last week
cisnlp / GlotScript
View on GitHub
🖋 Resource and Tool for Writing System Identification (Unicode 17.0) -- LREC 2024
☆21Feb 17, 2026Updated 2 weeks ago
cisnlp / MEXA
View on GitHub
🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
☆11Apr 6, 2025Updated 11 months ago
Unbabel / smaug
View on GitHub
Python package to augment multilingual data
☆15Feb 15, 2023Updated 3 years ago
ZurichNLP / ContraDecode
View on GitHub
The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Deco…
☆36Aug 29, 2025Updated 6 months ago
dadelani / sib-200
View on GitHub
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
☆23Jan 26, 2025Updated last year
hplt-project / OpusTrainer
View on GitHub
Curriculum training
☆22Jun 25, 2025Updated 8 months ago
ehsanasgari / 1000Langs
View on GitHub
Creating super-parallel corpora of more than 1500+ unique languages for NLP research
☆34Dec 8, 2022Updated 3 years ago
fyvo / WMT-Biomed-Test
View on GitHub
☆13Aug 23, 2024Updated last year
cisnlp / ofa
View on GitHub
A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
☆18Nov 26, 2023Updated 2 years ago
laurieburchell / open-lid-dataset
View on GitHub
Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)
☆74Apr 1, 2025Updated 11 months ago
google-research / url-nlp
View on GitHub
☆267Aug 1, 2025Updated 7 months ago
ahmetustun / hyperx
View on GitHub
☆21Dec 5, 2022Updated 3 years ago
MaxyLee / 3AM
View on GitHub
Official code and data of "3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset"
☆12Dec 8, 2024Updated last year
rbawden / mt-bigscience
View on GitHub
Evaluation results for Machine Translation within the BigScience project
☆11May 15, 2023Updated 2 years ago
mt-upc / ZeroSwot
View on GitHub
Pushing the Limits of Zero-shot End-to-End Speech Translation
☆26Dec 12, 2024Updated last year
hplt-project / OpusCleaner
View on GitHub
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
☆58Feb 3, 2026Updated last month
ZurichNLP / multilingual-instruction-tuning
View on GitHub
Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"
☆26Jun 3, 2025Updated 9 months ago
Betswish / Cross-Lingual-Consistency
View on GitHub
Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper he…
☆27Aug 8, 2025Updated 7 months ago
UNHSAILLab / TaCo
View on GitHub
TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes
☆13Jul 1, 2025Updated 8 months ago
hexuandeng / Mono4SiMT
View on GitHub
The implementation for our paper, "Improving Simultaneous Machine Translation with Monolingual Data," accepted to AAAI 2023. 🎉
☆12Jul 19, 2023Updated 2 years ago
cisnlp / GlotLID
View on GitHub
💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
☆190Updated this week
zjwang21 / StrokeNet
View on GitHub
The official code for our EMNLP 2022 long paper [Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation…
☆26Sep 10, 2025Updated 5 months ago
LeslieOverfitting / selective_distillation
View on GitHub
☆38Jun 3, 2021Updated 4 years ago
fe1ixxu / ALMA
View on GitHub
State-of-the-art LLM-based translation models.
☆579Apr 9, 2025Updated 10 months ago
wxjiao / ParroT
View on GitHub
The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1…
☆177Dec 31, 2024Updated last year
kpu / MEMT
View on GitHub
System Combination
☆16Aug 28, 2015Updated 10 years ago
facebookresearch / stopes
View on GitHub
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…
☆297Updated this week
masakhane-io / afriqa
View on GitHub
Crosslingual Question Answering for African Languages
☆30Sep 27, 2024Updated last year
google-research / xtreme-up
View on GitHub
☆53Jun 6, 2023Updated 2 years ago
hsing-wang / Awesome-LLM-MT
View on GitHub
☆254May 30, 2024Updated last year
bigscience-workshop / multilingual-modeling
View on GitHub
BLOOM+1: Adapting BLOOM model to support a new unseen language
☆74Mar 2, 2024Updated 2 years ago
thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆162Sep 18, 2025Updated 5 months ago
UBC-NLP / afrolid
View on GitHub
AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.
☆36Feb 5, 2026Updated last month
UriSha / EmbeddinglessNMT
View on GitHub
The implementation of "Neural Machine Translation without Embeddings", NAACL 2021
☆33Jun 9, 2021Updated 4 years ago
ictnlp / DiverseNMT
View on GitHub
Source code for the AAAI 2020 long paper <Modeling Fluency and Faithfulness for Diverse Neural Machine Translation>.
☆19Mar 10, 2020Updated 5 years ago
gabrielStanovsky / mt_gender
View on GitHub
☆55Apr 26, 2022Updated 3 years ago
konstantinjdobler / focus
View on GitHub
[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
☆36Jun 7, 2025Updated 9 months ago