ijdutse / hausa-corpusLinks
A collection of textual datasets in Hausa language and the corresponding translation in English language.
☆16Updated 4 years ago
Alternatives and similar repositories for hausa-corpus
Users that are interested in hausa-corpus are comparing it to the libraries listed below
Sorting:
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆77Updated 3 years ago
- Crosslingual Question Answering for African Languages☆31Updated last year
- Shoonya - Platform to Annotate and label data at scale.☆58Updated last year
- Yorùbá language training text for NLP, ASR and TTS tasks☆81Updated 2 years ago
- Finite-state script normalization and processing utilities☆43Updated last month
- CorrectLy - Open Source Spelling & Grammar correction☆43Updated 2 years ago
- MAFAND-MT☆59Updated last year
- Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check: https://github.com/AI4Bharat/…☆60Updated 4 years ago
- ☆57Updated 3 years ago
- Building an effective preprocessing tool for African languages☆13Updated last year
- Almost state of art text generation library☆66Updated last month
- GPTNERMED is a language model-generated, synthetic dataset and an open neural NER model for medical entities designed for German data.☆16Updated 2 years ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆111Updated last year
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆33Updated 2 weeks ago
- Master's thesis project in collaboration with Rasa, focusing on knowledge distillation from BERT into different very small networks and a…☆13Updated 3 years ago
- A tiny BERT for low-resource monolingual models☆31Updated last month
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆45Updated 5 years ago
- Transliteration models for 21 Indic languages☆101Updated 2 years ago
- Translation demonstrator☆34Updated 5 years ago
- Masakhane Web is a translation web application for solely African Languages.☆37Updated 2 years ago
- Aranizer: A Custom Tokenizer based on SentencePiece and BPE tailored for Arabic Language Modeling☆20Updated last year
- Documentation effort for the BookCorpus dataset☆34Updated 4 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆40Updated 3 years ago
- fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-ha…☆41Updated 2 years ago
- Web App Capable of Predicting Next Word Using BERT☆14Updated 2 years ago
- ☆10Updated last year
- ☆20Updated 3 years ago
- Named entity recognition for the legal domain☆42Updated 4 years ago
- Aksharamukha Python Library☆55Updated 9 months ago
- 🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, …☆20Updated last year