A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.
☆114Apr 26, 2024Updated 2 years ago
Alternatives and similar repositories for africanlp-public-datasets
Users that are interested in africanlp-public-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data, Embeddings, Stopword lists, code, and baselines for COLING 2020 paper titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text …☆15Apr 26, 2024Updated 2 years ago
- This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/…☆39Jul 31, 2025Updated 9 months ago
- A repository containing links to useful phonological software☆12Feb 16, 2023Updated 3 years ago
- Scripts to create speech corpora from open.bible☆13Jan 3, 2022Updated 4 years ago
- The kinyarwanda model for deepspeech☆17May 11, 2021Updated 5 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Machine Translation for Africa☆319Jun 14, 2022Updated 3 years ago
- ☆12Mar 7, 2022Updated 4 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆45Oct 13, 2022Updated 3 years ago
- scipts for working with open.bible data☆26Jan 24, 2022Updated 4 years ago
- MAFAND-MT☆62Jul 9, 2024Updated last year
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆84May 31, 2022Updated 3 years ago
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆38Oct 14, 2025Updated 7 months ago
- ☆121Oct 15, 2025Updated 7 months ago
- 🫠 check your data, before you wreck your model☆16Aug 11, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A webapp for the syntax-prosody analyst working in Optimality Theory, with automated Gen, Con and Eval. Download build files from syntax-…☆14Sep 27, 2023Updated 2 years ago
- All our community docs! Start here! Lets put Africa on the NLP Map☆67Apr 16, 2024Updated 2 years ago
- Hosts text-to-speech corpus and speech synthesizers for African languages.☆18May 31, 2023Updated 2 years ago
- Python classes for the Buckeye Corpus☆26Mar 30, 2018Updated 8 years ago
- Creates video from TTS output and viseme images.☆17Jun 18, 2022Updated 3 years ago
- ☆11Jul 12, 2021Updated 4 years ago
- Open Source Crimean Tatar Text-to-Speech datasets☆14Feb 23, 2025Updated last year
- An apa7 template for quarto/posit☆12Jan 25, 2023Updated 3 years ago
- A wrapper, a lemmatizer and REST API implemented in Python for emMorph (Humor) Hungarian morphological analyzer☆11Feb 11, 2021Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆15May 24, 2022Updated 4 years ago
- ☆19Feb 4, 2024Updated 2 years ago
- COMET for African languages☆11Jan 24, 2025Updated last year
- POS for African languages☆21Jun 25, 2025Updated 11 months ago
- ☆14Jan 25, 2026Updated 4 months ago
- SERENGETI: Massively Multilingual Language Models for Africa☆17Oct 26, 2023Updated 2 years ago
- Linguistic processing for Common Voice☆59Jan 18, 2024Updated 2 years ago
- A fasttrack implementation in python☆13Feb 10, 2026Updated 3 months ago
- Trainable algorithm for automatic measurement of voice onset time☆69Jul 26, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- An R package for implementing and evaluating Maximum Entropy Optimality Theory models☆10May 14, 2026Updated 2 weeks ago
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects☆24May 20, 2026Updated last week
- Hausa-NMT: Empirical Study of Neural Machine translation for English-Hausa-English☆17Oct 20, 2020Updated 5 years ago
- Praat-based tools for EGG analysis☆20Sep 21, 2023Updated 2 years ago
- Read in a 'Praat' 'TextGrid' File☆17Oct 28, 2025Updated 7 months ago
- Download Zindi's compositions datasets directly to google colab☆14Feb 29, 2020Updated 6 years ago
- REST api for mozilla deepspeech voice recognition engine☆20Nov 1, 2021Updated 4 years ago