ijdutse / hausa-corpus
A collection of textual datasets in Hausa language and the corresponding translation in English language.
☆14Updated 3 years ago
Alternatives and similar repositories for hausa-corpus:
Users that are interested in hausa-corpus are comparing it to the libraries listed below
- Crosslingual Question Answering for African Languages☆29Updated 5 months ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆69Updated 2 years ago
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆31Updated last year
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…☆25Updated 2 years ago
- MAFAND-MT☆55Updated 7 months ago
- Hinglish Text Classification☆30Updated last year
- MasakhaNEWS: News Topic Classification for African Languages☆19Updated 9 months ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆37Updated 2 years ago
- Implementation of Z-BERT-A: a zero-shot pipeline for unknown intent detection.☆39Updated last year
- All my experiments with the various transformers and various transformer frameworks available☆14Updated 3 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- 🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, …☆17Updated 8 months ago
- ☆11Updated 3 years ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/☆48Updated last year
- ☆108Updated last year
- ☆11Updated 3 years ago
- Code for "CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection" (V. Blasch…☆9Updated 4 years ago
- This repo contains 3 hours of audio speech recordings in Yoruba language collected for research purposes.☆16Updated 4 years ago
- Goldfish: Monolingual language models for 350 languages.☆15Updated 6 months ago
- ☆21Updated last month
- Fast whitespace correction with Transformers☆15Updated 10 months ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Updated 3 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated last year
- Finite-state script normalization and processing utilities☆39Updated this week
- semantically distinct key phrase extraction using hilbert hashes.☆48Updated 3 years ago
- Summarizer in python with Spacy and Universal Sentence Encoder build on Flask framework☆20Updated last year
- This will hold the data pipeline to convert raw audio data to speech which will act as input dataset for speech-to-text pipeline☆32Updated 2 years ago