cisnlp / GlotLIDView external linksLinks
π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
β186Nov 19, 2025Updated 2 months ago
Alternatives and similar repositories for GlotLID
Users that are interested in GlotLID are comparing it to the libraries listed below
Sorting:
- πΈ GlotWeb: Web Indexing for Low-Resource Languages -- under construction.β17Aug 13, 2025Updated 6 months ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β74Apr 1, 2025Updated 10 months ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- PathPiece tokenizerβ13Nov 10, 2024Updated last year
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023β106Apr 20, 2024Updated last year
- β13Aug 23, 2024Updated last year
- OpusFilter - Parallel corpus processing toolkitβ115Updated this week
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Modelsβ11Jan 19, 2024Updated 2 years ago
- β14Oct 4, 2024Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,885Updated this week
- Tool to fix bitexts and tag near-duplicates for removalβ34Sep 4, 2025Updated 5 months ago
- β220Oct 27, 2025Updated 3 months ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".β279Nov 3, 2023Updated 2 years ago
- POS for African languagesβ19Jun 25, 2025Updated 7 months ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β36Jun 7, 2025Updated 8 months ago
- Minimum Bayes Risk Decoding for Hugging Face Transformersβ60Jun 3, 2024Updated last year
- NTREX -- News Test References for MT Evaluationβ88Jun 5, 2024Updated last year
- β59Nov 18, 2025Updated 2 months ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resourceβ¦β27Updated this week
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β87Sep 12, 2024Updated last year
- Do Multilingual Language Models Think Better in English?β42Aug 3, 2023Updated 2 years ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.β64Jul 29, 2024Updated last year
- A framework for evaluating Machine Translation models.β12May 26, 2025Updated 8 months ago
- Taranis NG is an OSINT gathering and analysis tool for CSIRT teams and organisations. It allows team-to-team collaboration, and contains β¦β10Oct 17, 2023Updated 2 years ago
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialectsβ23Jan 26, 2025Updated last year
- A tool that locates, downloads, and extracts machine translation corporaβ162Sep 18, 2025Updated 4 months ago
- Pipeline for pulling and processing online language model pretraining data from the webβ176Jul 31, 2023Updated 2 years ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinkiβ30Updated this week
- Facebook Low Resource (FLoRes) MT Benchmarkβ762Nov 20, 2023Updated 2 years ago
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.β340Dec 18, 2024Updated last year
- statically generated weekly digest of articles read in Pocketβ10May 14, 2019Updated 6 years ago
- Web archiving utility libraryβ11Dec 3, 2025Updated 2 months ago
- Seamless Voice Interactions with LLMsβ12Oct 28, 2023Updated 2 years ago
- A libre software which is providing a backend architecture for collecting data from probes and storing proof of checks.β11Jan 16, 2026Updated 3 weeks ago
- β10Feb 12, 2024Updated 2 years ago
- Ivanti Pulse Secure CVE-2023-46805 Scanner - Based on Assetnote's Researchβ12Jan 19, 2024Updated 2 years ago
- [ACM-MM 2025 Workshop] More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment.β25Nov 25, 2025Updated 2 months ago
- β263Aug 1, 2025Updated 6 months ago
- Sequence models in Numpyβ25Oct 9, 2020Updated 5 years ago