ijdutse / hausa-corpusLinks
A collection of textual datasets in Hausa language and the corresponding translation in English language.
☆16Updated 4 years ago
Alternatives and similar repositories for hausa-corpus
Users that are interested in hausa-corpus are comparing it to the libraries listed below
Sorting:
- Crosslingual Question Answering for African Languages☆31Updated 11 months ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆77Updated 3 years ago
- Building an effective preprocessing tool for African languages☆13Updated last year
- Documentation effort for the BookCorpus dataset☆34Updated 4 years ago
- MAFAND-MT☆57Updated last year
- Almost state of art text generation library☆66Updated last month
- COMET for African languages☆10Updated 7 months ago
- Masakhane Web is a translation web application for solely African Languages.☆37Updated 2 years ago
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆33Updated last year
- Shoonya - Platform to Annotate and label data at scale.☆57Updated last year
- A tiny BERT for low-resource monolingual models☆31Updated 11 months ago
- Yorùbá language training text for NLP, ASR and TTS tasks☆80Updated 2 years ago
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆44Updated 4 years ago
- CorrectLy - Open Source Spelling & Grammar correction☆42Updated 2 years ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆41Updated 2 years ago
- Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of …☆61Updated 5 years ago
- A collection of preprocessed datasets and pretrained models for generating paraphrases.☆30Updated 4 years ago
- Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check: https://github.com/AI4Bharat/…☆60Updated 4 years ago
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 2 years ago
- Using Machine Learning to Create Funny Memes☆25Updated 2 years ago
- Implementation of Z-BERT-A: a zero-shot pipeline for unknown intent detection.☆42Updated 2 years ago
- An example of multilingual machine translation using a pretrained version of mt5 from Hugging Face.☆42Updated 4 years ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 q…☆89Updated last year
- Aranizer: A Custom Tokenizer based on SentencePiece and BPE tailored for Arabic Language Modeling☆20Updated last year
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆24Updated 3 years ago
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated last year
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 4 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Updated last year
- Using short models to classify long texts☆21Updated 2 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆40Updated 2 years ago