Exploration-Lab / HLDC
☆13Updated 3 weeks ago
Alternatives and similar repositories for HLDC:
Users that are interested in HLDC are comparing it to the libraries listed below
- ☆87Updated last week
- OpenNyAI is a mission aimed at developing open source software and datasets to catalyze the creation of AI-powered solutions to improve a…☆37Updated 10 months ago
- indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2☆122Updated last year
- Yet Another Neural Machine Translation Toolkit☆177Updated 7 months ago
- An instruction-based benchmark for text improvements.☆141Updated 2 years ago
- Efficient Attention for Long Sequence Processing☆92Updated last year
- Long Document Summarization Papers☆141Updated last year
- Code Repository for the IndicXNLI paper.☆14Updated last year
- A benchmark for code-switched NLP, ACL 2020☆74Updated 8 months ago
- An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch☆171Updated 2 years ago
- Some notebooks for NLP☆194Updated last year
- MAFAND-MT☆55Updated 7 months ago
- LexGLUE: A Benchmark Dataset for Legal Language Understanding in English☆195Updated last year
- Description Describes the IndicNLP corpus and associated datasets☆162Updated last year
- This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 4…☆262Updated 10 months ago
- A repo to explore different NLP tasks which can be solved using T5☆172Updated 4 years ago
- This repository contains the HiNER dataset released with our paper at LREC 2022☆14Updated last year
- This repository contains the code for "Generating Datasets with Pretrained Language Models".☆187Updated 3 years ago
- A Python package to compute HONEST, a score to measure hurtful sentence completions in language models. Published at NAACL 2021.☆21Updated 2 years ago
- ☆93Updated 11 months ago
- A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approa…☆94Updated 2 years ago
- This repository is dedicated to development of code-mixed language resources.☆24Updated last year
- Master thesis with code investigating methods for incorporating long-context reasoning in low-resource languages, without the need to pre…☆33Updated 3 years ago
- [EMNLP 2021] Improving and Simplifying Pattern Exploiting Training☆154Updated 2 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆67Updated 11 months ago
- TSAR2022 Shared Task on Lexical Simplification - Datasets and Evaluation scripts☆10Updated 2 years ago
- Pretraining, fine-tuning and evaluation scripts for IndicBERT-v2 and IndicXTREME☆89Updated 5 months ago
- Shared code for training sentence embeddings with Flax / JAX☆27Updated 3 years ago
- Long-context pretrained encoder-decoder models☆94Updated 2 years ago