Toluwase / Word-Level-Language-Identification-for-Resource-Scarce-Links
English, Hausa, Igbo and Yoruba corpora and results (presented in excel files) of word-level language identification research using the character trigram of the featured languages
☆15Updated 6 years ago
Alternatives and similar repositories for Word-Level-Language-Identification-for-Resource-Scarce-
Users that are interested in Word-Level-Language-Identification-for-Resource-Scarce- are comparing it to the libraries listed below
Sorting:
- Yorùbá language training text for NLP, ASR and TTS tasks☆77Updated 2 years ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆106Updated last year
- Automatic Diacritic Restoration of Yorùbá language Text☆24Updated 11 months ago
- A small python script that transliterates Arabic text using the Buckwalter Transliteration Scheme. It allows for multiple decisions to be…☆26Updated 11 years ago
- The repository contains all the codes necessary for my project - Automatic Speech Recognition System in Hindi Language ( Project descript…☆28Updated 5 years ago
- ☆43Updated 2 years ago
- Xlit-Crowd: Hindi-English Transliteration Corpus☆38Updated 10 years ago
- Machine Translation for Africa☆289Updated 3 years ago
- Python library for converting numbers to words for all Indian Languages.☆35Updated last month
- ☆43Updated 7 years ago
- A Python based API to access Indian language WordNets.☆38Updated 3 years ago
- A curated list of research papers and resources on code-switching☆318Updated 6 months ago
- ☆69Updated last year
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆74Updated 3 years ago
- Repository containing experimentation platform on how to train, infer on wav2vec2 models.☆87Updated 2 years ago
- A Simple Flask App to interact with your Machine Translation Model☆12Updated 5 years ago
- This is a package in Python which implements a tokenizer, stemmer for Hindi language☆95Updated 4 years ago
- The first Dialectal Arabic Code Switching - DACS corpus from broadcast speech. Annotated at the token-level, considering both the linguis…☆14Updated 3 years ago
- Python library for converting UTF to WX and vice-versa for Indian languages.☆47Updated 3 years ago
- This repository☆30Updated 2 years ago
- ☆42Updated 3 years ago
- Server framework for Kaldi ASR Toolkit☆97Updated last year
- All our community docs! Start here! Lets put Africa on the NLP Map☆60Updated last year
- A guide to building language technology in new languages.☆58Updated 3 years ago
- Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2☆114Updated 6 years ago
- Automatically extract grammatical edits from parallel original and corrected sentences.☆11Updated 8 years ago
- A module for normalising text.☆174Updated 3 years ago
- Curated list of publicly available parallel corpus for Indian Languages☆33Updated 3 years ago
- ☆51Updated 3 years ago
- Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.c…☆286Updated 2 years ago