ijdutse / hausa-corpus
A collection of textual datasets in Hausa language and the corresponding translation in English language.
☆15Updated 4 years ago
Alternatives and similar repositories for hausa-corpus:
Users that are interested in hausa-corpus are comparing it to the libraries listed below
- Crosslingual Question Answering for African Languages☆29Updated 6 months ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆72Updated 2 years ago
- MasakhaNEWS: News Topic Classification for African Languages☆23Updated 10 months ago
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆32Updated last year
- MAFAND-MT☆55Updated 8 months ago
- Almost state of art text generation library☆66Updated 5 months ago
- Tool to take your ML model from local to production with one-line of code.☆25Updated last year
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆37Updated 2 years ago
- All my experiments with the various transformers and various transformer frameworks available☆14Updated 3 years ago
- Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check: https://github.com/AI4Bharat/…☆60Updated 3 years ago
- Hausa-NMT: Empirical Study of Neural Machine translation for English-Hausa-English☆15Updated 4 years ago
- Building an effective preprocessing tool for African languages☆12Updated last year
- Shoonya - Platform to Annotate and label data at scale.☆53Updated 6 months ago
- ☆20Updated 3 years ago
- Aranizer: A Custom Tokenizer based on SentencePiece and BPE tailored for Arabic Language Modeling☆18Updated 7 months ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- Common crawl pretrained sentencepiece tokenizers for English and Japanese for various vocabulary sizes. Also development environment for …☆10Updated 3 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated last year
- ☆109Updated last year
- ☆22Updated 10 months ago
- Web App Capable of Predicting Next Word Using BERT☆13Updated 2 years ago
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…☆25Updated 2 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆39Updated 2 years ago
- 🫠 check your data, before you wreck your model☆16Updated 2 years ago
- Fast whitespace correction with Transformers☆16Updated 11 months ago
- Documentation effort for the BookCorpus dataset☆34Updated 3 years ago
- Predicting what word comes next with Tensorflow.☆10Updated last year
- Hinglish Text Classification☆30Updated last year
- Data, Embeddings, Stopword lists, code, and baselines for COLING 2020 paper titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text …☆12Updated 11 months ago
- GPTNERMED is a language model-generated, synthetic dataset and an open neural NER model for medical entities designed for German data.☆16Updated last year