google-research-datasets/dakshina

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/dakshina)

google-research-datasets / dakshina

The Dakshina dataset is a collection of text in both Latin and native scripts for 12 South Asian languages. For each language, the dataset includes a large collection of native script Wikipedia text, a romanization lexicon of words in the native script with attested romanizations, and some full sentence parallel data in both a native script of t…

☆211

Alternatives and similar repositories for dakshina

Users that are interested in dakshina are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AI4Bharat / indicnlp_catalog
View on GitHub
A collaborative catalog of NLP resources for Indic languages
☆638Dec 14, 2024Updated last year
libindic / indic-trans
View on GitHub
The project aims on adding a state-of-the-art transliteration module for cross transliterations among all Indian languages including Engl…
☆275Oct 28, 2022Updated 3 years ago
AI4Bharat / indic-bart
View on GitHub
Pre-trained, multilingual sequence-to-sequence models for Indian languages
☆51Jul 20, 2022Updated 4 years ago
in-rolls / indicate
View on GitHub
transliterate hindi, punjabi to english
☆17Updated this week
AI4Bharat / indicTrans
View on GitHub
indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2
☆141Jan 2, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
goru001 / inltk
View on GitHub
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer m…
☆838Jan 20, 2024Updated 2 years ago
midas-research / bhaav
View on GitHub
Dataset of sentences from Hindi stories tagged with different emotion tags
☆11Nov 26, 2019Updated 6 years ago
anoopkunchukuttan / crowd-indic-transliteration-data
View on GitHub
Xlit-Crowd: Hindi-English Transliteration Corpus
☆38Feb 17, 2015Updated 11 years ago
banglakit / lemmatizer
View on GitHub
A rule-based lemmatizer for Bengali / Bangla based written in Python. Under active development.
☆26Dec 28, 2019Updated 6 years ago
anoopkunchukuttan / indic_nlp_library
View on GitHub
Resources and tools for Indian language Natural Language Processing
☆640Jun 7, 2024Updated 2 years ago
AI4Bharat / Indic-BERT-v1
View on GitHub
Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.c…
☆297May 11, 2023Updated 3 years ago
AI4Bharat / indicnlp_corpus
View on GitHub
Description Describes the IndicNLP corpus and associated datasets
☆206Apr 16, 2023Updated 3 years ago
virtualvinodh / aksharamukha-python
View on GitHub
Aksharamukha Python Library
☆62Feb 2, 2025Updated last year
arijitx / BanglaNLP
View on GitHub
Resources and Tool for Bangla language computation
☆14Feb 20, 2026Updated 5 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
sjayakum / nlp-machine-translation
View on GitHub
CSCI 544 NLP Research Project - Machine Translation
☆11Apr 10, 2017Updated 9 years ago
irshadbhat / indic-wx-converter
View on GitHub
Python library for converting UTF to WX and vice-versa for Indian languages.
☆47May 26, 2022Updated 4 years ago
rezacsedu / Classification_Benchmarks_Benglai_NLP
View on GitHub
Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network
☆20Jul 26, 2021Updated 5 years ago
Rowan1224 / FakeNews
View on GitHub
☆39Jan 13, 2024Updated 2 years ago
microsoft / GLUECoS
View on GitHub
A benchmark for code-switched NLP, ACL 2020
☆76May 28, 2024Updated 2 years ago
jerinphilip / ilmulti
View on GitHub
Tooling to play around with multilingual machine translation for Indian Languages.
☆22Mar 5, 2022Updated 4 years ago
anoopkunchukuttan / geomm
View on GitHub
Geometry-aware Multilingual Embeddings
☆26Dec 8, 2022Updated 3 years ago
ymoslem / MT-Tools
View on GitHub
Collection of Common Machine Translation Tools
☆11Jul 26, 2022Updated 4 years ago
microsoft / MIMICS
View on GitHub
MIMICS: A Large-Scale Data Collection for Search Clarification
☆86Sep 1, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
sagorbrur / bnlp
View on GitHub
BNLP is a natural language processing toolkit for Bengali Language.
☆309Apr 1, 2026Updated 3 months ago
sagorbrur / bnaug
View on GitHub
Bangla Text Augmentation
☆11Aug 30, 2023Updated 2 years ago
google / language-resources
View on GitHub
Datasets and tools for basic natural language processing.
☆389Sep 10, 2021Updated 4 years ago
TrigonaMinima / HinglishNLP
View on GitHub
☆47Jan 23, 2020Updated 6 years ago
Speech-Lab-IITM / Hindi-ASR-Challenge
View on GitHub
🎯 Speech Recognition Challenge by Speech Lab - IIT Madras
☆10Nov 5, 2020Updated 5 years ago
neulab / compare-mt
View on GitHub
A tool for holistic analysis of language generations systems
☆471Sep 22, 2025Updated 10 months ago
AI4Bharat / Shoonya
View on GitHub
Shoonya - Platform to Annotate and label data at scale.
☆69Jun 23, 2026Updated last month
dodgejesse / show_your_work
View on GitHub
☆11Jan 21, 2020Updated 6 years ago
neubig / lowresource-nlp-bootcamp-2020
View on GitHub
The website for the CMU Language Technologies Institute low resource NLP bootcamp 2020
☆607Jun 4, 2020Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
denocris / MHPC-Natural-Language-Processing-Lectures
View on GitHub
This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP).)
☆33Sep 15, 2020Updated 5 years ago
AI4Bharat / IndicNLP-Transliteration
View on GitHub
Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check: https://github.com/AI4Bharat/…
☆60Jul 9, 2021Updated 5 years ago
anlausch / LIBERT
View on GitHub
Code from the paper "Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity"
☆19May 8, 2020Updated 6 years ago
KSMubasshir / bd-newspaper-crawlers
View on GitHub
A collection of Bangla newspaper and blog crawlers. Can be used to mine bangla text data for Natural Language Processing tasks.
☆18Jan 30, 2023Updated 3 years ago
libindic / Transliteration
View on GitHub
Transliteration module for Indian Languages
☆79Oct 24, 2025Updated 9 months ago
project-anuvaad / anuvaad-parallel-corpus
View on GitHub
☆24May 5, 2022Updated 4 years ago
mrinaldhar / en-hi-codemixed-corpus
View on GitHub
Repository for the English-Hindi Codemixed to Monolingual English Parallel Corpus
☆13Feb 17, 2019Updated 7 years ago