[EMNLP 2023] π¬ Language Identification with Support for More Than 2000 Labels
β207Apr 15, 2026Updated last month
Alternatives and similar repositories for GlotLID
Users that are interested in GlotLID are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [LREC 2024] π Resource and Tool for Writing System Identificationβ21Mar 29, 2026Updated 2 months ago
- [WWW 2026] πΈ GlotWeb: Web Indexing for Minority Languagesβ17Apr 14, 2026Updated last month
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β76Apr 1, 2025Updated last year
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languagesβ106Apr 14, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ACL 2025] π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated last year
- OpusFilter - Parallel corpus processing toolkitβ115Updated this week
- PathPiece tokenizerβ14Nov 10, 2024Updated last year
- [NeurIPS 2024] πΈ GlotCC Dataset and Piplineβ20Apr 6, 2025Updated last year
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β37Jun 7, 2025Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β3,077May 26, 2026Updated 2 weeks ago
- π€ Tokenizers.js: A pure JS/TS implementation of today's most used tokenizersβ51May 26, 2026Updated 2 weeks ago
- β242Oct 27, 2025Updated 7 months ago
- Tool to fix bitexts and tag near-duplicates for removalβ35Sep 4, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialectsβ24May 20, 2026Updated 3 weeks ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β90Sep 12, 2024Updated last year
- ParCourE - Parallel Corpus Explorerβ12Dec 27, 2021Updated 4 years ago
- A framework for evaluating Machine Translation models.β12Apr 21, 2026Updated last month
- NTREX -- News Test References for MT Evaluationβ87Jun 5, 2024Updated 2 years ago
- β12Mar 17, 2026Updated 2 months ago
- β59Nov 18, 2025Updated 6 months ago
- Do Multilingual Language Models Think Better in English?β42Aug 3, 2023Updated 2 years ago