Andrews2017/africanlp-public-datasets

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Andrews2017/africanlp-public-datasets)

Andrews2017 / africanlp-public-datasets

A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.

☆117

Alternatives and similar repositories for africanlp-public-datasets

Users that are interested in africanlp-public-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Andrews2017 / KINNEWS-and-KIRNEWS-Corpus
View on GitHub
Data, Embeddings, Stopword lists, code, and baselines for COLING 2020 paper titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text …
☆15Apr 26, 2024Updated 2 years ago
csikasote / BembaSpeech
View on GitHub
This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/…
☆41Jul 31, 2025Updated 11 months ago
connormayer / phonological_software
View on GitHub
A repository containing links to useful phonological software
☆12Feb 16, 2023Updated 3 years ago
Digital-Umuganda / Deepspeech-Kinyarwanda
View on GitHub
The kinyarwanda model for deepspeech
☆17May 11, 2021Updated 5 years ago
masakhane-io / masakhane-mt
View on GitHub
Machine Translation for Africa
☆322Jun 14, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
asmelashteka / HornMT
View on GitHub
Machine translation (MT) benchmark dataset for languages in the Horn of Africa.
☆46Oct 13, 2022Updated 3 years ago
coqui-ai / open-bible-scripts
View on GitHub
scipts for working with open.bible data
☆26Jan 24, 2022Updated 4 years ago
masakhane-io / lafand-mt
View on GitHub
MAFAND-MT
☆63Jul 9, 2024Updated 2 years ago
castorini / afriberta
View on GitHub
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages
☆83May 31, 2022Updated 4 years ago
maxbane / pyCelex
View on GitHub
A python module for reading and organizing data from CELEX2.
☆15Mar 20, 2019Updated 7 years ago
anzeyimana / DeepKIN
View on GitHub
DeepKIN -- A deep learning toolkit for Kinyarwanda NLP.
☆14Jun 4, 2025Updated last year
masakhane-io / masakhane-news
View on GitHub
MasakhaNEWS: News Topic Classification for African Languages
☆26May 12, 2024Updated 2 years ago
masakhane-io / masakhanePreprocessor
View on GitHub
Building an effective preprocessing tool for African languages
☆13Jan 24, 2024Updated 2 years ago
shraddhabarke / SyPhon
View on GitHub
SyPhon: Constraint-based Learning of Phonological Rules
☆11Mar 5, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
syntax-prosody-ot / main
View on GitHub
A webapp for the syntax-prosody analyst working in Optimality Theory, with automated Gen, Con and Eval. Download build files from syntax-…
☆14Sep 27, 2023Updated 2 years ago
unza-speech-lab / zambezi-voice
View on GitHub
Repository for multilingual speech data resources for native languages of Zambia.
☆22Oct 9, 2024Updated last year
UBC-NLP / serengeti
View on GitHub
SERENGETI: Massively Multilingual Language Models for Africa
☆17Oct 26, 2023Updated 2 years ago
masakhane-io / masakhane-ner
View on GitHub
☆122Oct 15, 2025Updated 9 months ago
masakhane-io / masakhane-community
View on GitHub
All our community docs! Start here! Lets put Africa on the NLP Map
☆68Apr 16, 2024Updated 2 years ago
coqui-ai / data-checker
View on GitHub
🫠 check your data, before you wreck your model
☆16Aug 11, 2022Updated 3 years ago
neulab / AfricanVoices
View on GitHub
Hosts text-to-speech corpus and speech synthesizers for African languages.
☆19May 31, 2023Updated 3 years ago
aflr-archive / viseme-to-video
View on GitHub
Creates video from TTS output and viseme images.
☆16Jun 18, 2022Updated 4 years ago
ellisk42 / bpl_phonology
View on GitHub
☆16May 24, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
telecombcn-dl / labs-all
View on GitHub
Labs for deep learning courses at UPC ETSETB TelecomBCN.
☆17Jul 6, 2026Updated 3 weeks ago
alsonicr / quarto-apa7
View on GitHub
An apa7 template for quarto/posit
☆12Jan 25, 2023Updated 3 years ago
Neurotech-HQ / tigopesa
View on GitHub
Python package to ease the Tigo Pesa API integration
☆14May 3, 2021Updated 5 years ago
anzeyimana / kinyabert-acl2022
View on GitHub
☆19Feb 4, 2024Updated 2 years ago
connormayer / maxent.ot
View on GitHub
An R package for implementing and evaluating Maximum Entropy Optimality Theory models
☆10Updated this week
egorsmkv / qirimtatar-tts-datasets
View on GitHub
Open Source Crimean Tatar Text-to-Speech datasets
☆14Feb 23, 2025Updated last year
masakhane-io / masakhane-pos
View on GitHub
POS for African languages
☆21Jun 25, 2025Updated last year
pacotvj99 / testsampleR
View on GitHub
☆14Jan 25, 2026Updated 6 months ago
dadelani / sib-200
View on GitHub
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
☆26May 20, 2026Updated 2 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
futureverse / future.p2p
View on GitHub
future.p2p: A Peer-to-Peer Compute Cluster via Futureverse
☆16Jul 15, 2026Updated 2 weeks ago
datenlabor-bmz / evals-for-every-language
View on GitHub
Tracking language proficiency of AI models for every language
☆20Jun 10, 2026Updated last month
ftyers / commonvoice-utils
View on GitHub
Linguistic processing for Common Voice
☆59Jan 18, 2024Updated 2 years ago
mlml / autovot
View on GitHub
Trainable algorithm for automatic measurement of voice onset time
☆69Jul 26, 2023Updated 3 years ago
sayedmohamedscu / Zindi_colab
View on GitHub
Download Zindi's compositions datasets directly to google colab
☆14Feb 29, 2020Updated 6 years ago
gpu-poor / gramvaani_hindi_asr
View on GitHub
This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge
☆16Mar 26, 2022Updated 4 years ago
tjmahr / readtextgrid
View on GitHub
Read in a 'Praat' 'TextGrid' File
☆17Oct 28, 2025Updated 9 months ago