TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and without the use of translation, and is designed for the training and evaluation of automatic question answering systems. This repository provides evaluation code and a baseline system for the dataset.
☆317May 28, 2020Updated 5 years ago
Alternatives and similar repositories for tydiqa
Users that are interested in tydiqa are comparing it to the libraries listed below
Sorting:
- New dataset☆311Aug 31, 2021Updated 4 years ago
- This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".☆80Jun 3, 2021Updated 4 years ago
- ☆207Nov 12, 2021Updated 4 years ago
- ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhi…☆49Apr 26, 2021Updated 4 years ago
- Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is design…☆1,092Jul 30, 2021Updated 4 years ago
- Dataset and baseline for ACL 2019 paper "XQA: A Cross-lingual Open-domain Question Answering Dataset"☆89Nov 16, 2021Updated 4 years ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆80Feb 16, 2022Updated 4 years ago
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆650Jan 4, 2023Updated 3 years ago
- We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically …☆192Jun 16, 2022Updated 3 years ago
- Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval☆43Jun 12, 2023Updated 2 years ago
- This dataset contains human judgements about answer equivalence. The data is based on SQuAD (Stanford Question Answering Dataset), and co…☆27Oct 24, 2022Updated 3 years ago
- Library for Knowledge Intensive Language Tasks☆965Mar 31, 2022Updated 3 years ago
- The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".☆28Jun 19, 2021Updated 4 years ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆561Jan 4, 2022Updated 4 years ago
- Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering☆175Jun 6, 2021Updated 4 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆31Jul 9, 2020Updated 5 years ago
- An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"☆120Apr 23, 2022Updated 3 years ago
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators☆2,371Mar 23, 2024Updated last year
- ☆1,297Dec 15, 2022Updated 3 years ago
- Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.☆1,752Dec 20, 2023Updated 2 years ago
- QED: A Framework and Dataset for Explanations in Question Answering☆119Aug 3, 2021Updated 4 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆389Nov 7, 2023Updated 2 years ago
- A BART version of an open-domain QA model in a closed-book setup☆119Aug 13, 2020Updated 5 years ago
- UnifiedQA: Crossing Format Boundaries With a Single QA System☆445May 9, 2022Updated 3 years ago
- Code associated with the Don't Stop Pretraining ACL 2020 paper☆539Nov 15, 2021Updated 4 years ago
- The official implementation of ICLR 2020, "Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering".☆435Jul 25, 2024Updated last year
- LAnguage Model Analysis☆1,390Jul 7, 2024Updated last year
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.☆1,860Apr 6, 2023Updated 2 years ago
- BLEURT is a metric for Natural Language Generation based on transfer learning.☆786Aug 4, 2023Updated 2 years ago
- EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering☆38Nov 7, 2021Updated 4 years ago
- Shared repository for open-sourced projects from the Google AI Language team.☆1,749Feb 20, 2026Updated last week
- DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue☆286Jul 6, 2023Updated 2 years ago
- ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: giv…☆461Sep 11, 2024Updated last year
- PyTorch original implementation of "Unsupervised Question Decomposition for Question Answering"☆122Aug 11, 2023Updated 2 years ago
- Method to improve inference time for BERT. This is an implementation of the paper titled "PoWER-BERT: Accelerating BERT Inference via Pro…☆62Sep 17, 2025Updated 5 months ago
- Neural Question Generation using the SQuAD and NewsQA datasets☆110Dec 8, 2022Updated 3 years ago
- Adversarial Natural Language Inference Benchmark☆398May 12, 2022Updated 3 years ago
- Neural text-to-text question generation☆216Nov 13, 2020Updated 5 years ago
- Code to support the paper "Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets"☆65Aug 31, 2021Updated 4 years ago