Toluwase / Word-Level-Language-Identification-for-Resource-Scarce-
English, Hausa, Igbo and Yoruba corpora and results (presented in excel files) of word-level language identification research using the character trigram of the featured languages
☆15Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for Word-Level-Language-Identification-for-Resource-Scarce-
- Yorùbá language training text for NLP, ASR and TTS tasks☆73Updated last year
- Unsupervised Neural Machine Translation from West African Pidgin (Creole) to English without a single parallel sentence☆75Updated 4 years ago
- Automatic Diacritic Restoration of Yorùbá language Text☆24Updated 4 months ago
- A curated list of research papers and resources on code-switching☆298Updated 3 weeks ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆93Updated 7 months ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆66Updated 2 years ago
- A Collection of Research Papers by Data Science Nigeria☆25Updated 9 months ago
- Ìrànlọ́wọ́ is a utility library for analysis & (pre)processing of Yorùbá text → https://pypi.org/project/iranlowo☆17Updated last year
- How to extract sentiment from opinions without any labels☆137Updated 2 years ago
- Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2☆112Updated 5 years ago
- All our community docs! Start here! Lets put Africa on the NLP Map☆54Updated 7 months ago
- Tutorial for BERT (and other transformer) embeddings with spaCy and Rasa☆65Updated 4 years ago
- Machine Translation for Africa☆278Updated 2 years ago
- 📄 A repo containing notes and discussions for our weekly NLP/ML paper discussions.☆149Updated 4 years ago
- Collection of Deep Learning Text Classification Models in Keras; Includes a GPU tutorial.☆14Updated 6 years ago
- Punctuation restoration and spell correction experiments.☆249Updated 3 years ago
- Automatic Dialect Detection Repository☆39Updated 2 years ago
- Almost state of art text generation library☆66Updated 3 weeks ago
- A guide to building language technology in new languages.☆57Updated 2 years ago
- Python package to convert text from English to Indian Languages☆9Updated 3 years ago
- docker for HF wav2vec2-sprint☆12Updated 3 years ago
- Repository containing experimentation platform on how to train, infer on wav2vec2 models.☆85Updated 2 years ago
- ☆105Updated 11 months ago
- A Hindi-English Dataset for Text Normalization☆14Updated 2 years ago
- Arabic Dialect Identification on AOC data.☆23Updated 5 years ago
- The Dakshina dataset is a collection of text in both Latin and native scripts for 12 South Asian languages. For each language, the datase…☆190Updated 4 years ago
- Curated repository of notes from papers I'm reading, mostly NLP related. Updated regularly.☆128Updated 3 years ago
- This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/…☆32Updated 6 months ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆110Updated 2 years ago
- A virtual assistant that actually assists!☆56Updated 2 years ago