[EMNLP 2023] π¬ Language Identification with Support for More Than 2000 Labels
β207Apr 15, 2026Updated 2 months ago
Alternatives and similar repositories for GlotLID
Users that are interested in GlotLID are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [LREC 2024] π Resource and Tool for Writing System Identificationβ22Mar 29, 2026Updated 3 months ago
- [WWW 2026] πΈ GlotWeb: Web Indexing for Minority Languagesβ17Apr 14, 2026Updated 2 months ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β77Apr 1, 2025Updated last year
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languagesβ107Apr 14, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Modelsβ11Jan 19, 2024Updated 2 years ago
- [NeurIPS 2024] πΈ GlotCC Dataset and Piplineβ20Apr 6, 2025Updated last year
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β37Jun 7, 2025Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β3,138May 26, 2026Updated last month
- GC4LM: A Colossal (Biased) language model for Germanβ13May 2, 2021Updated 5 years ago
- β247Oct 27, 2025Updated 8 months ago
- Tool to fix bitexts and tag near-duplicates for removalβ35Sep 4, 2025Updated 9 months ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β90Sep 12, 2024Updated last year
- ParCourE - Parallel Corpus Explorerβ12Dec 27, 2021Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A framework for evaluating Machine Translation models.β12Apr 21, 2026Updated 2 months ago
- NTREX -- News Test References for MT Evaluationβ87Jun 5, 2024Updated 2 years ago
- β11Mar 17, 2026Updated 3 months ago
- β59Nov 18, 2025Updated 7 months ago
- Do Multilingual Language Models Think Better in English?β42Aug 3, 2023Updated 2 years ago
- β15Oct 4, 2024Updated last year
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.β65Jul 29, 2024Updated last year
- β273Aug 1, 2025Updated 10 months ago
- POS for African languagesβ21Jun 25, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Universal Romanizer that can convert any unicode script to roman (latin) scriptβ248Jul 26, 2024Updated last year
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resourceβ¦β27Feb 16, 2026Updated 4 months ago
- Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging (AACL'22)β11Aug 25, 2023Updated 2 years ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]β33Jan 23, 2025Updated last year
- Package to align tokens from different tokenizations.β16Mar 25, 2024Updated 2 years ago
- ParaNames: A multilingual resource for parallel namesβ40May 20, 2024Updated 2 years ago
- β16Jun 14, 2024Updated 2 years ago
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalizationβ19Mar 7, 2025Updated last year
- Code for Zero-Shot Tokenizer Transferβ145Jan 14, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Minimum Bayes Risk Decoding for Hugging Face Transformersβ60Jun 3, 2024Updated 2 years ago
- A tool that locates, downloads, and extracts machine translation corporaβ165Apr 13, 2026Updated 2 months ago
- β144Apr 8, 2026Updated 2 months ago
- Facebook Low Resource (FLoRes) MT Benchmarkβ769Nov 20, 2023Updated 2 years ago
- A Bias Tester framework for LLMsβ24Mar 25, 2026Updated 3 months ago
- Seed Machine Translation Dataβ34Nov 12, 2024Updated last year
- The most accurate natural language detection library for Python, suitable for short text and mixed-language textβ1,746Jun 18, 2026Updated last week