[ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
β106Apr 14, 2026Updated this week
Alternatives and similar repositories for Glot500
Users that are interested in Glot500 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Modelsβ11Jan 19, 2024Updated 2 years ago
- π Resource and Tool for Writing System Identification (Unicode 17.0) -- LREC 2024β21Mar 29, 2026Updated 2 weeks ago
- πΈ GlotWeb: Web Indexing for Minority Languages (WWW 2026)β17Feb 27, 2026Updated last month
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialectsβ23Jan 26, 2025Updated last year
- [ACL 2025] π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated last year
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Creating super-parallel corpora of more than 1500+ unique languages for NLP researchβ34Dec 8, 2022Updated 3 years ago
- Python package to augment multilingual dataβ15Feb 15, 2023Updated 3 years ago
- The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoβ¦β38Aug 29, 2025Updated 7 months ago
- β13Aug 23, 2024Updated last year
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- Curriculum trainingβ22Jun 25, 2025Updated 9 months ago
- β272Aug 1, 2025Updated 8 months ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β75Apr 1, 2025Updated last year
- β21Dec 5, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β38Jun 3, 2021Updated 4 years ago
- Evaluation results for Machine Translation within the BigScience projectβ11May 15, 2023Updated 2 years ago
- Crosslingual Question Answering for African Languagesβ31Sep 27, 2024Updated last year
- Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging (AACL'22)β11Aug 25, 2023Updated 2 years ago
- Pushing the Limits of Zero-shot End-to-End Speech Translationβ26Dec 12, 2024Updated last year
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper heβ¦β28Aug 8, 2025Updated 8 months ago
- πΌ Baby's CoThought: Leveraging LLMs for Enhanced Reasoning in Compact Models (BabyLM Challenge)β17Jan 10, 2025Updated last year
- TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processesβ14Jul 1, 2025Updated 9 months ago
- Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"β26Jun 3, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Synthetic Data Generation for Evaluationβ14Feb 21, 2025Updated last year
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β36Jun 7, 2025Updated 10 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.β58Feb 3, 2026Updated 2 months ago
- Official code and data of "3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset"β12Dec 8, 2024Updated last year
- This repository contains source code for the paper "Language Model Prior for Low-Resource Neural Machine Translation"β42Mar 16, 2021Updated 5 years ago
- The official code for our EMNLP 2022 long paper [Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translationβ¦β26Sep 10, 2025Updated 7 months ago
- β53Jun 6, 2023Updated 2 years ago
- π A LaTeX template for LMU Master/Bachelor theses (paper+slides).β16May 22, 2019Updated 6 years ago
- System Combinationβ16Aug 28, 2015Updated 10 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- BLOOM+1: Adapting BLOOM model to support a new unseen languageβ74Mar 2, 2024Updated 2 years ago
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB teβ¦β302Updated this week
- The implementation for our paper, "Improving Simultaneous Machine Translation with Monolingual Data," accepted to AAAI 2023. πβ12Jul 19, 2023Updated 2 years ago
- Finite-state script normalization and processing utilitiesβ47Updated this week
- Code for paper βLanguage Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Abilityββ15Jun 13, 2023Updated 2 years ago
- β35Jun 15, 2023Updated 2 years ago
- β21Feb 13, 2023Updated 3 years ago