masakhane-io/masakhanePreprocessor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/masakhane-io/masakhanePreprocessor)

masakhane-io / masakhanePreprocessor

Building an effective preprocessing tool for African languages

☆13

Alternatives and similar repositories for masakhanePreprocessor

Users that are interested in masakhanePreprocessor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Andrews2017 / KINNEWS-and-KIRNEWS-Corpus
View on GitHub
Data, Embeddings, Stopword lists, code, and baselines for COLING 2020 paper titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text …
☆15Apr 26, 2024Updated 2 years ago
uds-lsv / menyo-20k_MT
View on GitHub
MENYO-20k Corpus in "The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation" in MT Summit 2021
☆15Jan 16, 2023Updated 3 years ago
masakhane-io / masakhane-community
View on GitHub
All our community docs! Start here! Lets put Africa on the NLP Map
☆68Apr 16, 2024Updated 2 years ago
Desire100 / Care-Me-AI-automated-Doctor
View on GitHub
AutoDoc is a mobile app for iOS and Android for medical purpose that helps you chat with an automated general practice doctor and get a q…
☆10Jan 19, 2020Updated 6 years ago
whentze / to_method
View on GitHub
A utility micro-crate for using `Into` more ergonomically.
☆12May 17, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
masakhane-io / lafand-mt
View on GitHub
MAFAND-MT
☆63Jul 9, 2024Updated 2 years ago
vertaix / Vendi-Sampling
View on GitHub
☆13Jan 13, 2025Updated last year
anzeyimana / DeepKIN
View on GitHub
DeepKIN -- A deep learning toolkit for Kinyarwanda NLP.
☆14Jun 4, 2025Updated last year
JohannesPertl / where_is_webb
View on GitHub
Mobile app that provides notifications about the status of the James Webb Space Telescope
☆14Aug 3, 2023Updated 2 years ago
usb-rs / usb-async
View on GitHub
Future-based USB host API for Rust
☆17Jun 7, 2019Updated 7 years ago
fpfeffer / WealthMobility
View on GitHub
Visualizing Intergenerational Wealth Mobility and Racial Inequality
☆10Mar 21, 2019Updated 7 years ago
natalieweber / leipzig
View on GitHub
A LaTeX package to typeset and index linguistic gloss abbreviations
☆16May 22, 2022Updated 4 years ago
lauhaide / clads
View on GitHub
XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…
☆10Nov 4, 2022Updated 3 years ago
HubSpot / cms-webpack-serverless-boilerplate
View on GitHub
Boilerplate for bundling serverless functions with webpack locally, prior to uploading to the CMS.
☆14Mar 4, 2023Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
anzeyimana / kinyabert-acl2022
View on GitHub
☆19Feb 4, 2024Updated 2 years ago
lexibank / pylexibank
View on GitHub
The python curation library for lexibank
☆21Jun 25, 2026Updated 3 weeks ago
gisfromscratch / gdelt-notebooks
View on GitHub
Sample notebooks for using the Global Database of Events, Language and Tone (GDELT).
☆19Nov 8, 2020Updated 5 years ago
onset / lameta
View on GitHub
The Metadata Editor for Transparent Archiving of language document materials
☆25Jul 16, 2026Updated last week
LaSTUS-TALN-UPF / TSAR-2022-Shared-Task
View on GitHub
TSAR2022 Shared Task on Lexical Simplification - Datasets and Evaluation scripts
☆10Oct 27, 2022Updated 3 years ago
StuWatson / propawhat
View on GitHub
☆16Mar 13, 2022Updated 4 years ago
OpenStratos / server-rs
View on GitHub
OpenStratos written in Rust.
☆18Jun 18, 2023Updated 3 years ago
nlp-stat-test / nlp-stat-test
View on GitHub
The NLPStatTest project
☆12Mar 12, 2022Updated 4 years ago
julmaxi / Abstractive-Timeline-Summarization
View on GitHub
☆11Dec 8, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
buschmo / Simple-German-Corpus
View on GitHub
Code to create the dataset from "A New Aligned Simple German Corpus
☆11Jan 8, 2024Updated 2 years ago
thisismattmiller / lcc-pdf-to-json
View on GitHub
Conversion of the LCC outline schedules from PDF to JSON
☆28Apr 2, 2020Updated 6 years ago
jakobhellermann / bevy-contrib-inspector
View on GitHub
bevy plugin for starting a webserver to visually edit bevy resources
☆22Jan 21, 2021Updated 5 years ago
NC0DER / GraphOfDocs
View on GitHub
GraphOfDocs: Representing multiple documents as a single graph
☆21Jun 22, 2022Updated 4 years ago
EdisonScientific / kosmos-figures
View on GitHub
Kosmos technical report figures, validation code, and reproducible analyses
☆29Nov 4, 2025Updated 8 months ago
dennlinger / klexikon
View on GitHub
Klexikon: A German Dataset for Joint Summarization and Simplification
☆17Oct 5, 2022Updated 3 years ago
masakhane-io / masakhane-pos
View on GitHub
POS for African languages
☆21Jun 25, 2025Updated last year
sobamchan / xscitldr
View on GitHub
X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents (JCDL 2022)
☆14Jul 22, 2022Updated 4 years ago
folio-org / folio-install
View on GitHub
Runbooks for FOLIO installation
☆21Feb 5, 2026Updated 5 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
unhcr / Jetson
View on GitHub
http://jetson.unhcr.org
☆27Dec 1, 2023Updated 2 years ago
informagi / GEEER
View on GitHub
Code supporting the paper Graph-Embedding Empowered Entity Retrieval
☆24Apr 11, 2025Updated last year
valentinhofmann / flota
View on GitHub
☆18Feb 1, 2023Updated 3 years ago
chan0park / VoynaSlov
View on GitHub
☆19Nov 14, 2022Updated 3 years ago
masakhane-io / masakhane-news
View on GitHub
MasakhaNEWS: News Topic Classification for African Languages
☆26May 12, 2024Updated 2 years ago
eryk-mazus / sigh
View on GitHub
Seamless Voice Interactions with LLMs
☆12Oct 28, 2023Updated 2 years ago
Andrews2017 / africanlp-public-datasets
View on GitHub
A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.
☆116Apr 26, 2024Updated 2 years ago