UBC-NLP / afrolid
AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.
☆27Updated last year
Related projects ⓘ
Alternatives and complementary repositories for afrolid
- Resource and Tool for Writing System Identification -- LREC 2024☆13Updated 5 months ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆34Updated 2 years ago
- A survey of corpora for Germanic low-resource languages and dialects☆24Updated 3 months ago
- NTREX -- News Test References for MT Evaluation☆75Updated 5 months ago
- Curriculum training☆16Updated last month
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆90Updated 6 months ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆66Updated 6 months ago
- ☆13Updated 2 years ago
- Multilingual Open Text☆25Updated last week
- Statistics on multilingual datasets☆17Updated 2 years ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆72Updated last year
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 2 months ago
- SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages☆7Updated 9 months ago
- ☆19Updated 3 years ago
- Bicleaner fork that uses neural networks☆38Updated 3 months ago
- A spaCy custom component that extracts and normalizes temporal expressions☆52Updated last year
- Code for extracting parallel corpora from pmindia☆16Updated 4 years ago
- Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)☆31Updated last week
- A guide to building language technology in new languages.☆57Updated 2 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆48Updated 2 months ago
- ☆16Updated last year
- Compiled tools, datasets, and other resources for historical text normalization.☆16Updated 5 years ago
- XED multilingual emotion datasets☆56Updated last year
- ☆40Updated 2 years ago
- ☆43Updated 3 months ago
- An easy-to-use library to extract indices from texts.☆29Updated 3 years ago
- OpusFilter - Parallel corpus processing toolkit☆102Updated 2 months ago
- This repository contains a demonstrative implementation for pooling-based models, e.g., DeepPyramidion complementing our paper "Sparsifyi…☆14Updated 2 years ago
- SIGMORPHON 2022 Shared Task on Morpheme Segmentation☆24Updated last year