UBC-NLP / afrolid
AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.
β31Updated 3 weeks ago
Alternatives and similar repositories for afrolid:
Users that are interested in afrolid are comparing it to the libraries listed below
- Curriculum trainingβ17Updated 3 weeks ago
- π Resource and Tool for Writing System Identification -- LREC 2024β13Updated 9 months ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.β103Updated 11 months ago
- Code for extracting parallel corpora from pmindiaβ16Updated 5 years ago
- A survey of corpora for Germanic low-resource languages and dialectsβ25Updated 3 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.β51Updated 2 months ago
- NTREX -- News Test References for MT Evaluationβ81Updated 9 months ago
- Tool to fix bitexts and tag near-duplicates for removalβ30Updated last month
- A tiny BERT for low-resource monolingual modelsβ31Updated 6 months ago
- Multilingual Open Textβ25Updated 5 months ago
- This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The texβ¦β53Updated 4 years ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β70Updated 11 months ago
- The central repo for Creole based NLU and NLG workβ18Updated 10 months ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLTβ21Updated last year
- BERT and ELECTRA models trained on Europeana Newspapersβ37Updated 3 years ago
- β26Updated last month
- A guide to building language technology in new languages.β58Updated 3 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.β37Updated 2 years ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Doβ¦β80Updated 8 months ago
- Statistics on multilingual datasetsβ17Updated 2 years ago
- β109Updated last year
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.β39Updated 2 years ago
- β17Updated 2 years ago
- β42Updated 3 years ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinkiβ22Updated last month
- AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/β48Updated last year
- β64Updated 2 years ago
- OpusFilter - Parallel corpus processing toolkitβ104Updated this week
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.β74Updated last year
- Creating super-parallel corpora of more than 1500+ unique languages for NLP researchβ33Updated 2 years ago