Curated list of publicly available parallel corpus for Indian Languages
☆36Jul 15, 2021Updated 4 years ago
Alternatives and similar repositories for Indian_ParallelCorpus
Users that are interested in Indian_ParallelCorpus are comparing it to the libraries listed below
Sorting:
- Language identification and normalisation in code switching data tailored with a three-step decoding process☆24Dec 23, 2019Updated 6 years ago
- Tooling to play around with multilingual machine translation for Indian Languages.☆22Mar 5, 2022Updated 4 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Dec 8, 2022Updated 3 years ago
- Repository for the English-Hindi Codemixed to Monolingual English Parallel Corpus☆13Feb 17, 2019Updated 7 years ago
- Hosts text-to-speech corpus and speech synthesizers for African languages.☆18May 31, 2023Updated 2 years ago
- Exploring the Limits of Low-Resource Neural Machine Translation☆34Feb 16, 2023Updated 3 years ago
- Repository for sharing the data in the Tamasheq language, one of the target languages for the low-resource speech translation track at IW…☆18Nov 30, 2022Updated 3 years ago
- A collaborative catalog of NLP resources for Indic languages☆627Dec 14, 2024Updated last year
- ☆45Jul 5, 2022Updated 3 years ago
- Code for the paper "Improving Robustness of Machine Translation with Synthetic Noise"☆21Dec 23, 2019Updated 6 years ago
- A benchmark for code-switched NLP, ACL 2020☆76May 28, 2024Updated last year
- ☆23Nov 6, 2022Updated 3 years ago
- A list of advisory blogs and resources that I have found useful so far.☆22Nov 25, 2020Updated 5 years ago
- Source code for "Improving Robustness of Neural Machine Translation with Multi-task Learning"☆19Aug 15, 2019Updated 6 years ago
- TUFS Asian Language Parallel Corpus☆52May 1, 2023Updated 2 years ago
- Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021☆29Feb 1, 2023Updated 3 years ago
- ☆22Apr 8, 2022Updated 3 years ago
- Collection of auditory models.☆33Feb 4, 2024Updated 2 years ago
- UrIII Period (Sumerian Language) Information Extraction pipeline including, Named Entity Recognition, Part Of Speech Tagging and Machine …☆31Apr 6, 2025Updated 10 months ago
- An English to Hindi Dictionary☆28Sep 30, 2020Updated 5 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆34Jun 29, 2025Updated 8 months ago
- ☆30Nov 1, 2019Updated 6 years ago
- ☆82Jan 30, 2026Updated last month
- This repository is about how to build an SQLite version of the Arabic WordNet database.☆10Mar 19, 2019Updated 6 years ago
- Minangkabau NLP corpus. PACLIC 2020☆10Jun 7, 2021Updated 4 years ago
- I use various Data Science and machine learning techniques to analyze customer data using STP framework. I preprocessed the data, perform…☆12Apr 26, 2020Updated 5 years ago
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…☆81Aug 31, 2021Updated 4 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆42Oct 13, 2022Updated 3 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Oct 14, 2022Updated 3 years ago
- Named Entity (NER) annotations of the Hebrew Treebank (Haaretz newspaper) corpus, including: morpheme and token level NER labels, nested …☆10Dec 27, 2021Updated 4 years ago
- This repository contains my models that has been trained to translate from kikuyu to kiswahili. It also contains the dataset used for the…☆13Sep 10, 2018Updated 7 years ago
- MG top-down beam parsing☆13Jul 2, 2018Updated 7 years ago
- A tool to collect/validate audio recordings from workers on Amazon Mechanical Turk. Written in Python/Flask. (originally hosted on github…☆14Dec 19, 2022Updated 3 years ago
- Introduction to Algorithms, Third Edition.☆10Apr 2, 2017Updated 8 years ago
- Resources and tools for Indian language Natural Language Processing☆628Jun 7, 2024Updated last year
- Source Code for "Improved Embeddings for Learning Prerequisite Chains" (CPSC 490 - Senior Project)☆11May 2, 2019Updated 6 years ago
- Fake news detection using Naïve Bayes in Python along with confusion matrix calculated using sklearn.☆10Aug 16, 2021Updated 4 years ago
- Inspirational post ids collected from Reddit using pushift.io and RoBERTa☆10Jan 18, 2024Updated 2 years ago
- ☆36Aug 25, 2022Updated 3 years ago