google-research-datasets / tydiqa
TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the training and evaluation of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset.
☆301Updated 4 years ago
Alternatives and similar repositories for tydiqa:
Users that are interested in tydiqa are comparing it to the libraries listed below
- New dataset☆303Updated 3 years ago
- Code and data to support the paper "PAQ 65 Million Probably-Asked Questions andWhat You Can Do With Them"☆202Updated 3 years ago
- ☆190Updated 3 years ago
- Scripts and links to recreate the ELI5 dataset.☆324Updated 3 years ago
- DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue☆282Updated last year
- Interpretable Evaluation for (Almost) All NLP Tasks☆195Updated 2 years ago
- Unsupervised Question answering via Cloze Translation☆219Updated 2 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆361Updated last year
- An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"☆118Updated 2 years ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆557Updated 3 years ago
- ☆160Updated 5 years ago
- A Natural Language Inference (NLI) model based on Transformers (BERT and ALBERT)☆132Updated last year
- Adversarial Natural Language Inference Benchmark☆393Updated 2 years ago
- MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance☆205Updated last year
- A tool for holistic analysis of language generations systems☆468Updated 3 years ago
- Officially supported AllenNLP models☆540Updated 2 years ago
- Full Python implementation of the ROUGE metric, producing same results as in the official perl implementation.☆157Updated 5 years ago
- PyTorch original implementation of "Unsupervised Question Decomposition for Question Answering"☆120Updated last year
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆362Updated 3 years ago
- A BART version of an open-domain QA model in a closed-book setup☆119Updated 4 years ago
- ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: giv…☆440Updated 6 months ago
- Please see the readme file as well as our 2019 EMNLP paper linked here -->☆204Updated 11 months ago
- Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models …☆231Updated 2 years ago
- EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"☆339Updated 4 months ago
- Resources for the NAACL 2018 paper "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"☆367Updated 2 years ago
- Repository that accompanies "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction" (EMNLP 2019)☆205Updated 3 years ago
- Resources for the MRQA 2019 Shared Task☆292Updated 3 years ago
- Topic-Aware Convolutional Neural Networks for Extreme Summarization☆359Updated last year
- Pre-Trained Models for ToD-BERT☆292Updated last year
- [ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.o…☆603Updated 2 years ago