SunbirdAI / salt-data-archiveLinks
Multi-way parallel text corpus of 5 key Ugandan languages.
☆17Updated last year
Alternatives and similar repositories for salt-data-archive
Users that are interested in salt-data-archive are comparing it to the libraries listed below
Sorting:
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆40Updated 2 years ago
- Machine Translation for Africa☆296Updated 3 years ago
- All our community docs! Start here! Lets put Africa on the NLP Map☆60Updated last year
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆111Updated last year
- Yorùbá language training text for NLP, ASR and TTS tasks☆80Updated 2 years ago
- Masakhane Web is a translation web application for solely African Languages.☆37Updated 2 years ago
- ☆111Updated last year
- Facebook Low Resource (FLoRes) MT Benchmark☆753Updated last year
- A large scale Sanskrit-English translation dataset☆71Updated 2 years ago
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆33Updated last year
- Morphological processing for languages of the Horn of Africa☆46Updated 2 weeks ago
- Minangkabau NLP corpus. PACLIC 2020☆10Updated 4 years ago
- ☆20Updated 3 years ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆77Updated 3 years ago
- A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.☆484Updated this week
- ☆12Updated 3 years ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆32Updated 6 months ago
- This repository contains multi-modal speech data for African languages that can be used to train ASR and NLP models☆11Updated 3 years ago
- Arabic edition of BERT pretrained language models☆132Updated 4 years ago
- Transliteration for languages and dialects☆43Updated 3 years ago
- build gpt-index using chatgpt and sentence-transformers☆14Updated 2 years ago
- A parallel corpus of Sorani, Kurmanji and English☆13Updated 4 years ago
- ☆74Updated 2 years ago
- A collaborative catalog of NLP resources for Indic languages☆615Updated 9 months ago
- NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations☆788Updated last year
- English to Twi translation system being put together by the GhanaNLP team☆35Updated 3 months ago
- The largest public catalogue for Arabic NLP and speech datasets. There are +500 datasets annotated with more than 25 attributes.☆177Updated 3 months ago
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Updated last year
- A french sequence to sequence pretrained model☆62Updated 3 years ago
- Open source speech to text models for Indic Languages☆306Updated 3 years ago