Exploration-Lab / HLDC
☆13Updated 2 months ago
Alternatives and similar repositories for HLDC:
Users that are interested in HLDC are comparing it to the libraries listed below
- ☆88Updated last month
- OpenNyAI is a mission aimed at developing open source software and datasets to catalyze the creation of AI-powered solutions to improve a…☆39Updated 11 months ago
- Code Repository for the IndicXNLI paper.☆15Updated last year
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English☆200Updated last year
- This repository contains the HiNER dataset released with our paper at LREC 2022☆14Updated last year
- A benchmark for code-switched NLP, ACL 2020☆74Updated 10 months ago
- This repository is dedicated to development of code-mixed language resources.☆24Updated last year
- ☆44Updated 2 years ago
- Yet Another Neural Machine Translation Toolkit☆179Updated 3 weeks ago
- An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch☆171Updated 2 years ago
- Efficient Attention for Long Sequence Processing☆92Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME☆93Updated 6 months ago
- indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2☆125Updated last year
- Course for Interpreting ML Models☆52Updated 2 years ago
- A Python package to compute HONEST, a score to measure hurtful sentence completions in language models. Published at NAACL 2021.☆21Updated 2 years ago
- Some notebooks for NLP☆198Updated last year
- A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanaga…☆35Updated last year
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆58Updated 5 months ago
- Pre-trained, multilingual sequence-to-sequence models for Indian languages☆46Updated 2 years ago
- Resources for cultural NLP research☆86Updated 2 months ago
- IndicGenBench is a high-quality, multilingual, multi-way parallel benchmark for evaluating Large Language Models (LLMs) on 4 user-facing …☆45Updated 6 months ago
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultin…☆23Updated last year
- 🔍 A statutory article retrieval dataset in French. (ACL 2022)☆39Updated last year
- An instruction-based benchmark for text improvements.☆141Updated 2 years ago
- ☆15Updated last year
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆101Updated 2 weeks ago
- ☆51Updated last year
- A monolingual and cross-lingual meta-embedding generation and evaluation framework☆80Updated 2 years ago
- ☆9Updated 3 years ago