AI4Bharat / IndicLID
Language Identification for Indian languages
☆10Updated 7 months ago
Related projects: ⓘ
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆52Updated last month
- IndicGenBench is a high-quality, multilingual, multi-way parallel benchmark for evaluating Large Language Models (LLMs) on 4 user-facing …☆41Updated 2 weeks ago
- ☆16Updated 6 months ago
- A blueprint for creating Pretraining and Fine-Tuning datasets for Indic languages☆88Updated last month
- A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanaga…☆33Updated 8 months ago
- Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME☆67Updated 2 weeks ago
- Translation models for 22 scheduled languages of India☆216Updated 3 weeks ago
- Towards Building Text-To-Speech Systems for the Next Billion Users - Microsoft Research Intern Work - Accepted at ICASSP 2023☆46Updated last year
- A simple, consistent and extendable toolkit for IndicTrans2☆16Updated 3 weeks ago
- Text to Speech for Indic languages☆49Updated 2 years ago
- Transliteration models for 21 Indic languages☆69Updated 11 months ago
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultin…☆23Updated 8 months ago
- Code related to training/fine-tuning Hindi/Hinglish models.☆47Updated 8 months ago
- indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2☆116Updated 8 months ago
- Hinglish Text Classification☆30Updated last year
- FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists☆18Updated last month
- Generate large textual corpora for almost any language by crawling the web☆11Updated 7 months ago
- Contains materials for my talk "You don't know TensorFlow".☆9Updated last year
- MultiOCR, an interface that connects multiple open-source OCR and various Cloud OCR.☆31Updated last year
- Repository for fine-tuning gemma models using unsloth for indic languages☆80Updated 6 months ago
- Quantization of LLMs and benchmarking.☆10Updated 5 months ago
- Using short models to classify long texts☆20Updated last year
- Shoonya - Platform to Annotate and label data at scale.☆48Updated 2 weeks ago
- ☆111Updated last week
- Pretraining, fine-tuning and evaluation scripts for Indic-Wav2Vec2☆78Updated 6 months ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆33Updated last year
- ☆28Updated 11 months ago
- Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR☆44Updated 2 months ago
- Transcribe your videos and translate it into Indic languages.☆26Updated this week
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated last year