UBC-NLP / afrolidLinks
AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.
β32Updated 6 months ago
Alternatives and similar repositories for afrolid
Users that are interested in afrolid are comparing it to the libraries listed below
Sorting:
- π Resource and Tool for Writing System Identification -- LREC 2024β19Updated last year
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.β51Updated 2 months ago
- πΈ GlotWeb: Web Indexing for Low-Resource Languages -- under construction.β14Updated last month
- A survey of corpora for Germanic low-resource languages and dialectsβ25Updated 9 months ago
- NTREX -- News Test References for MT Evaluationβ85Updated last year
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β74Updated 5 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.β156Updated last year
- Curriculum trainingβ18Updated 2 months ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Doβ¦β82Updated last year
- A module to compute textual lexical richness (aka lexical diversity).β110Updated 2 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.β38Updated 3 years ago
- β49Updated last year
- Statistics on multilingual datasetsβ17Updated 3 years ago
- β111Updated last year
- A tiny BERT for low-resource monolingual modelsβ31Updated 11 months ago
- π§ͺ Cutting-edge experimental spaCy components and featuresβ101Updated last year
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.β110Updated last year
- Multilingual Open Textβ25Updated 4 months ago
- ParaNames: A multilingual resource for parallel namesβ36Updated last year
- Repository for the paper "MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguatioβ¦β44Updated last year
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.β11Updated last year
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2β¦β68Updated 2 years ago
- Arabic Dialect Identification on AOC data.β24Updated 6 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.β40Updated 2 years ago
- Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".β99Updated 2 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP researchβ34Updated 2 years ago
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.β69Updated 4 years ago
- β75Updated 3 weeks ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β84Updated last year
- π« SpaCy wrapper for ConceptNet π«β95Updated 2 years ago