google-research-datasets / tydiqa
TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the training and evaluation of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset.
☆300Updated 4 years ago
Alternatives and similar repositories for tydiqa:
Users that are interested in tydiqa are comparing it to the libraries listed below
- New dataset☆302Updated 3 years ago
- ☆187Updated 3 years ago
- Interpretable Evaluation for (Almost) All NLP Tasks☆195Updated 2 years ago
- Repository that accompanies "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction" (EMNLP 2019)☆205Updated 3 years ago
- Code and data to support the paper "PAQ 65 Million Probably-Asked Questions andWhat You Can Do With Them"☆202Updated 3 years ago
- DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue☆283Updated last year
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆555Updated 3 years ago
- Scripts and links to recreate the ELI5 dataset.☆320Updated 3 years ago
- EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"☆337Updated 3 months ago
- Please see the readme file as well as our 2019 EMNLP paper linked here -->☆201Updated 10 months ago
- Unsupervised Question answering via Cloze Translation☆219Updated 2 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆362Updated 3 years ago
- A Natural Language Inference (NLI) model based on Transformers (BERT and ALBERT)☆132Updated last year
- MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance☆203Updated last year
- Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models …☆231Updated last year
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆357Updated last year
- Dataset for NAACL 2021 paper: "DART: Open-Domain Structured Data Record to Text Generation"☆151Updated 2 years ago
- ☆344Updated 3 years ago
- Resources for the NAACL 2018 paper "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"☆364Updated last year
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension and question answerin…☆213Updated last year
- A tool for holistic analysis of language generations systems☆467Updated 2 years ago
- [EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction☆119Updated 3 years ago
- The official tool for creating proceedings for conferences of the Association for Computational Linguistics (ACL).☆222Updated last month
- CrossWeigh: Training Named Entity Tagger from Imperfect Annotations☆177Updated 7 months ago
- Copora for evaluating NLU Services/Platforms such as Dialogflow, LUIS, Watson, Rasa etc.☆110Updated 2 years ago
- Collection of NLP model explanations and accompanying analysis tools☆145Updated last year
- UnifiedQA: Crossing Format Boundaries With a Single QA System☆433Updated 2 years ago
- Code to reproduce the experiments from the paper.☆101Updated last year
- An elaborate and exhaustive paper list for Named Entity Recognition (NER)☆394Updated 3 years ago
- SummVis is an interactive visualization tool for text summarization.☆252Updated 2 years ago