Exploration-Lab / HLDC
☆13Updated 2 months ago
Alternatives and similar repositories for HLDC:
Users that are interested in HLDC are comparing it to the libraries listed below
- ☆90Updated 2 months ago
- OpenNyAI is a mission aimed at developing open source software and datasets to catalyze the creation of AI-powered solutions to improve a…☆40Updated last year
- Efficient Attention for Long Sequence Processing☆93Updated last year
- This repository contains the HiNER dataset released with our paper at LREC 2022☆15Updated last year
- A benchmark for code-switched NLP, ACL 2020☆74Updated 10 months ago
- indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2☆125Updated last year
- ☆44Updated 2 years ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆68Updated last year
- A Python package to compute HONEST, a score to measure hurtful sentence completions in language models. Published at NAACL 2021.☆21Updated 2 weeks ago
- Course for Interpreting ML Models☆52Updated 2 years ago
- Code Repository for the IndicXNLI paper.☆15Updated last year
- Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME☆94Updated 2 weeks ago
- SemEval 2024 Task 1 : Textual Semantic Relatedness☆26Updated 10 months ago
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆59Updated 6 months ago
- ☆16Updated last year
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English☆201Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- A pipeline for transliteration, spell correction, POS tagging and word sense disambiguation of Hinglish code mixed data to Hindi Devanaga…☆36Updated last year
- Description Describes the IndicNLP corpus and associated datasets☆167Updated 2 years ago
- An instruction-based benchmark for text improvements.☆141Updated 2 years ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆100Updated last year
- Resources for cultural NLP research☆92Updated this week
- This repository contains materials for the SIGIR 2022 tutorial on opinion summarization.☆34Updated 2 years ago
- An open-source text summarization toolkit for non-experts. EMNLP'2021 Demo☆277Updated last year
- Dataset from the paper "Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering" (COLING 2022)☆113Updated 2 years ago
- This repository contains the code for "Generating Datasets with Pretrained Language Models".☆188Updated 3 years ago
- Generate large textual corpora for almost any language by crawling the web☆12Updated last year
- This repository is dedicated to development of code-mixed language resources.☆25Updated last year
- Pre-trained, multilingual sequence-to-sequence models for Indian languages☆46Updated 2 years ago
- Explainable Zero-Shot Topic Extraction☆62Updated 8 months ago