[ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
β106Apr 14, 2026Updated last month
Alternatives and similar repositories for Glot500
Users that are interested in Glot500 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Modelsβ11Jan 19, 2024Updated 2 years ago
- [LREC 2024] π Resource and Tool for Writing System Identificationβ21Mar 29, 2026Updated 2 months ago
- [WWW 2026] πΈ GlotWeb: Web Indexing for Minority Languagesβ17Apr 14, 2026Updated last month
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialectsβ24May 20, 2026Updated last week
- [ACL 2025] π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Creating super-parallel corpora of more than 1500+ unique languages for NLP researchβ34Dec 8, 2022Updated 3 years ago
- Python package to augment multilingual dataβ15Feb 15, 2023Updated 3 years ago
- The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoβ¦β38Aug 29, 2025Updated 9 months ago
- [EMNLP 2023] π¬ Language Identification with Support for More Than 2000 Labelsβ204Apr 15, 2026Updated last month
- β13Aug 23, 2024Updated last year
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- Curriculum trainingβ22Jun 25, 2025Updated 11 months ago
- β272Aug 1, 2025Updated 9 months ago
- β21Dec 5, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- β38Jun 3, 2021Updated 4 years ago
- Evaluation results for Machine Translation within the BigScience projectβ11May 15, 2023Updated 3 years ago
- State-of-the-art LLM-based translation models.β584Apr 9, 2025Updated last year
- Crosslingual Question Answering for African Languagesβ31Sep 27, 2024Updated last year
- Pushing the Limits of Zero-shot End-to-End Speech Translationβ25Dec 12, 2024Updated last year
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper heβ¦β28Aug 8, 2025Updated 9 months ago
- TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processesβ14Jul 1, 2025Updated 10 months ago
- Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"β26Jun 3, 2025Updated 11 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.β58Feb 3, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official code and data of "3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset"β12Dec 8, 2024Updated last year
- β254May 30, 2024Updated last year
- This repository contains source code for the paper "Language Model Prior for Low-Resource Neural Machine Translation"β43Mar 16, 2021Updated 5 years ago
- The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1β¦β177Dec 31, 2024Updated last year
- β53Jun 6, 2023Updated 2 years ago
- π A LaTeX template for LMU Master/Bachelor theses (paper+slides).β16May 22, 2019Updated 7 years ago
- System Combinationβ16Aug 28, 2015Updated 10 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen languageβ74Mar 2, 2024Updated 2 years ago
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB teβ¦β306May 9, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The implementation for our paper, "Improving Simultaneous Machine Translation with Monolingual Data," accepted to AAAI 2023. πβ12Jul 19, 2023Updated 2 years ago
- Finite-state script normalization and processing utilitiesβ49May 8, 2026Updated 3 weeks ago
- Universal Romanizer that can convert any unicode script to roman (latin) scriptβ248Jul 26, 2024Updated last year
- Code for paper βLanguage Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Abilityββ15Jun 13, 2023Updated 2 years ago
- β35Jun 15, 2023Updated 2 years ago
- β21Feb 13, 2023Updated 3 years ago
- SiLLM is a Simultaneous Machine Translation (SiMT) Framework. It utilizes a Large Language model as the translation model and employs a tβ¦β18Feb 22, 2024Updated 2 years ago