[EMNLP 2023] π¬ Language Identification with Support for More Than 2000 Labels
β200Apr 15, 2026Updated 3 weeks ago
Alternatives and similar repositories for GlotLID
Users that are interested in GlotLID are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [LREC 2024] π Resource and Tool for Writing System Identificationβ21Mar 29, 2026Updated last month
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β76Apr 1, 2025Updated last year
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languagesβ106Apr 14, 2026Updated 3 weeks ago
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Modelsβ11Jan 19, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ACL 2025] π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated last year
- β13Aug 23, 2024Updated last year
- OpusFilter - Parallel corpus processing toolkitβ115Updated this week
- PathPiece tokenizerβ14Nov 10, 2024Updated last year
- [NeurIPS 2024] πΈ GlotCC Dataset and Piplineβ20Apr 6, 2025Updated last year
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β36Jun 7, 2025Updated 11 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β3,041Updated this week
- GC4LM: A Colossal (Biased) language model for Germanβ13May 2, 2021Updated 5 years ago