google-research-datasets / natural-questionsView external linksLinks
Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answering systems.
☆1,090Jul 30, 2021Updated 4 years ago
Alternatives and similar repositories for natural-questions
Users that are interested in natural-questions are comparing it to the libraries listed below
Sorting:
- Shared repository for open-sourced projects from the Google AI Language team.☆1,747Updated this week
- TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and …☆317May 28, 2020Updated 5 years ago
- Dense Passage Retriever - is a set of tools and models for open domain Q&A task.☆1,858Apr 6, 2023Updated 2 years ago
- Library for Knowledge Intensive Language Tasks☆963Mar 31, 2022Updated 3 years ago
- Code and data to support the paper "PAQ 65 Million Probably-Asked Questions andWhat You Can Do With Them"☆209Aug 31, 2021Updated 4 years ago
- Scripts and links to recreate the ELI5 dataset.☆326Aug 31, 2021Updated 4 years ago
- Resources for the MRQA 2019 Shared Task☆294Aug 5, 2021Updated 4 years ago
- New dataset☆311Aug 31, 2021Updated 4 years ago
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"☆6,491Jan 14, 2026Updated last month
- ☆31Jun 19, 2020Updated 5 years ago
- An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"☆120Apr 23, 2022Updated 3 years ago
- ACL2020 Tutorial: Open-Domain Question Answering☆835Jan 1, 2021Updated 5 years ago
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.☆2,070Oct 16, 2025Updated 4 months ago
- Datasets for Question Answering by Search and Reading☆70Jan 19, 2018Updated 8 years ago
- ☆436Feb 4, 2024Updated 2 years ago
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension and question answerin…☆225Jun 12, 2023Updated 2 years ago
- Code for the TriviaQA reading comprehension dataset☆329Apr 5, 2024Updated last year
- XLNet: Generalized Autoregressive Pretraining for Language Understanding☆6,177May 28, 2023Updated 2 years ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆561Jan 4, 2022Updated 4 years ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,924Feb 14, 2023Updated 3 years ago
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.☆2,017Updated this week
- ☆175May 28, 2019Updated 6 years ago
- Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granul…☆1,540May 31, 2023Updated 2 years ago
- Fusion-in-Decoder☆591Oct 4, 2023Updated 2 years ago
- Reading Wikipedia to Answer Open-Domain Questions☆4,478Oct 1, 2023Updated 2 years ago
- Adversarial Natural Language Inference Benchmark☆398May 12, 2022Updated 3 years ago
- This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and …☆508Apr 15, 2020Updated 5 years ago
- An original implementation of EMNLP 2019, "A Discrete Hard EM Approach for Weakly Supervised Question Answering"☆135Jul 3, 2020Updated 5 years ago
- We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically …☆192Jun 16, 2022Updated 3 years ago
- Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning☆770Apr 7, 2023Updated 2 years ago
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators☆2,371Mar 23, 2024Updated last year
- docTTTTTquery document expansion model☆374Mar 25, 2023Updated 2 years ago
- An original implementation of ACL 2019, "Multi-hop Reading Comprehension through Question Decomposition and Rescoring"☆138Apr 23, 2022Updated 3 years ago
- An open-source NLP research library, built on PyTorch.☆11,890Nov 22, 2022Updated 3 years ago
- Anserini is a Lucene toolkit for reproducible information retrieval research☆1,096Updated this week
- A novel embedding training algorithm leveraging ANN search and achieved SOTA retrieval on Trec DL 2019 and OpenQA benchmarks☆383Jan 6, 2026Updated last month
- ☆559Apr 26, 2021Updated 4 years ago
- UnifiedQA: Crossing Format Boundaries With a Single QA System☆444May 9, 2022Updated 3 years ago
- Authors' implementation of EMNLP-IJCNLP 2019 paper "Answering Complex Open-domain Questions Through Iterative Query Generation"☆195Oct 29, 2019Updated 6 years ago