google-research-datasets / tydiqaLinks
TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the training and evaluation of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset.
☆317Updated 5 years ago
Alternatives and similar repositories for tydiqa
Users that are interested in tydiqa are comparing it to the libraries listed below
Sorting:
- New dataset☆311Updated 4 years ago
- ☆206Updated 4 years ago
- ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: giv…☆459Updated last year
- DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue☆286Updated 2 years ago
- Adversarial Natural Language Inference Benchmark☆397Updated 3 years ago
- Interpretable Evaluation for (Almost) All NLP Tasks☆195Updated 4 months ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆560Updated 4 years ago
- Code to reproduce the experiments from the paper.☆103Updated 2 years ago
- Scripts and links to recreate the ELI5 dataset.☆326Updated 4 years ago
- We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically …☆191Updated 3 years ago
- Unsupervised Question answering via Cloze Translation☆218Updated 3 years ago
- Code and data to support the paper "PAQ 65 Million Probably-Asked Questions andWhat You Can Do With Them"☆209Updated 4 years ago
- UnifiedQA: Crossing Format Boundaries With a Single QA System☆444Updated 3 years ago
- Easier Automatic Sentence Simplification Evaluation☆166Updated 2 years ago
- Repository that accompanies "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction" (EMNLP 2019)☆217Updated 4 years ago
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆650Updated 3 years ago
- Yet Another Neural Machine Translation Toolkit☆179Updated 10 months ago
- MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance☆210Updated 2 years ago
- Officially supported AllenNLP models☆557Updated 3 years ago
- Resources for the "SummEval: Re-evaluating Summarization Evaluation" paper☆409Updated last year
- A Natural Language Inference (NLI) model based on Transformers (BERT and ALBERT)☆138Updated 2 years ago
- Interpretable Evaluation for AI Systems☆365Updated 2 years ago
- This repository contains the code for "Generating Datasets with Pretrained Language Models".☆189Updated 4 years ago
- This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences fro…☆162Updated last year
- A repo to explore different NLP tasks which can be solved using T5☆173Updated 5 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆386Updated 2 years ago
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension and question answerin…☆225Updated 2 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆359Updated 3 years ago
- ☆344Updated 4 years ago
- Resources for the NAACL 2018 paper "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"☆387Updated 2 years ago