Exploration-Lab / HLDC
☆13Updated 2 years ago
Alternatives and similar repositories for HLDC:
Users that are interested in HLDC are comparing it to the libraries listed below
- OpenNyAI is a mission aimed at developing open source software and datasets to catalyze the creation of AI-powered solutions to improve a…☆36Updated 9 months ago
- ☆85Updated last year
- This repository contains the HiNER dataset released with our paper at LREC 2022☆15Updated last year
- A benchmark for code-switched NLP, ACL 2020☆74Updated 7 months ago
- A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanaga…☆35Updated last year
- indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2☆122Updated last year
- Yet Another Neural Machine Translation Toolkit☆176Updated 6 months ago
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English☆193Updated last year
- Code Repository for the IndicXNLI paper.☆14Updated last year
- A Python package to compute HONEST, a score to measure hurtful sentence completions in language models. Published at NAACL 2021.☆21Updated 2 years ago
- SemEval 2024 Task 1 : Textual Semantic Relatedness☆24Updated 6 months ago
- Description Describes the IndicNLP corpus and associated datasets☆158Updated last year
- An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch☆169Updated 2 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated last year
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆55Updated 2 months ago
- A reading list of up-to-date papers on NLP for Social Good.☆291Updated last year
- Pre-trained, multilingual sequence-to-sequence models for Indian languages☆45Updated 2 years ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆67Updated 10 months ago
- This repository is dedicated to development of code-mixed language resources.☆23Updated last year
- An instruction-based benchmark for text improvements.☆140Updated 2 years ago
- The IIT Bombay English-Hindi Parallel Corpus☆18Updated 2 years ago
- ☆39Updated last year
- Efficient Attention for Long Sequence Processing☆91Updated last year
- ☆16Updated 2 years ago
- ☆16Updated 10 months ago
- Generate large textual corpora for almost any language by crawling the web☆12Updated 11 months ago
- ☆14Updated 2 years ago
- MAFAND-MT☆55Updated 6 months ago
- SeeGULL is a broad-coverage stereotype dataset in English containing stereotypes about identity groups spanning 178 countries across 8 di…☆33Updated last year
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆96Updated last month