google-research-datasets/natural-questions

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/natural-questions)

google-research-datasets / natural-questions

Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answering systems.

☆1,135

Alternatives and similar repositories for natural-questions

Users that are interested in natural-questions are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

google-research / language
View on GitHub
Shared repository for open-sourced projects from the Google AI Language team.
☆1,787Jun 10, 2026Updated last month
facebookresearch / DPR
View on GitHub
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
☆1,869Apr 6, 2023Updated 3 years ago
mrqa / MRQA-Shared-Task-2019
View on GitHub
Resources for the MRQA 2019 Shared Task
☆294Aug 5, 2021Updated 4 years ago
google-research-datasets / tydiqa
View on GitHub
TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and …
☆319May 28, 2020Updated 6 years ago
facebookresearch / PAQ
View on GitHub
Code and data to support the paper "PAQ 65 Million Probably-Asked Questions andWhat You Can Do With Them"
☆211Aug 31, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
bdhingra / quasar
View on GitHub
Datasets for Question Answering by Search and Reading
☆70Jan 19, 2018Updated 8 years ago
allenai / document-qa
View on GitHub
☆437Feb 4, 2024Updated 2 years ago
facebookresearch / KILT
View on GitHub
Library for Knowledge Intensive Language Tasks
☆978Mar 31, 2022Updated 4 years ago
danqi / acl2020-openqa-tutorial
View on GitHub
ACL2020 Tutorial: Open-Domain Question Answering
☆835Jan 1, 2021Updated 5 years ago
zihangdai / xlnet
View on GitHub
XLNet: Generalized Autoregressive Pretraining for Language Understanding
☆6,180May 28, 2023Updated 3 years ago
seominjoon / piqa
View on GitHub
Phrase-Indexed Question Answering (PIQA)
☆93Apr 27, 2019Updated 7 years ago
mandarjoshi90 / triviaqa
View on GitHub
Code for the TriviaQA reading comprehension dataset
☆339Apr 5, 2024Updated 2 years ago
facebookresearch / XLM
View on GitHub
PyTorch original implementation of Cross-lingual Language Model Pretraining.
☆2,925Feb 14, 2023Updated 3 years ago
google-research / text-to-text-transfer-transformer
View on GitHub
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
☆6,536Jul 8, 2026Updated last week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
hotpotqa / hotpot
View on GitHub
☆595Apr 26, 2021Updated 5 years ago
google-deepmind / narrativeqa
View on GitHub
This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and …
☆518Apr 15, 2020Updated 6 years ago
facebookresearch / MLQA
View on GitHub
New dataset
☆311Aug 31, 2021Updated 4 years ago
qipeng / golden-retriever
View on GitHub
Authors' implementation of EMNLP-IJCNLP 2019 paper "Answering Complex Open-domain Questions Through Iterative Query Generation"
☆196Oct 29, 2019Updated 6 years ago
nelson-liu / contextual-repr-analysis
View on GitHub
A toolkit for evaluating the linguistic knowledge and transferability of contextual representations. Code for "Linguistic Knowledge and T…
☆212Oct 20, 2021Updated 4 years ago
allenai / allennlp
View on GitHub
An open-source NLP research library, built on PyTorch.
☆11,888Nov 22, 2022Updated 3 years ago
shmsw25 / DecompRC
View on GitHub
An original implementation of ACL 2019, "Multi-hop Reading Comprehension through Question Decomposition and Rescoring"
☆138Apr 23, 2022Updated 4 years ago
allenai / bi-att-flow
View on GitHub
Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granul…
☆1,545May 31, 2023Updated 3 years ago
facebookresearch / DrQA
View on GitHub
Reading Wikipedia to Answer Open-Domain Questions
☆4,471Oct 1, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
efficientqa / nq-open
View on GitHub
☆31Jun 19, 2020Updated 6 years ago
google-research-datasets / paws
View on GitHub
This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…
☆570Jan 4, 2022Updated 4 years ago
microsoft / MSMARCO-Question-Answering
View on GitHub
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension and question answerin…
☆234Jun 12, 2023Updated 3 years ago
facebookresearch / FiD
View on GitHub
Fusion-in-Decoder
☆596Oct 4, 2023Updated 2 years ago
namisan / mt-dnn
View on GitHub
Multi-Task Deep Neural Networks for Natural Language Understanding
☆2,259Mar 7, 2024Updated 2 years ago
facebookresearch / ELI5
View on GitHub
Scripts and links to recreate the ELI5 dataset.
☆324Aug 31, 2021Updated 4 years ago
nyu-mll / jiant
View on GitHub
jiant is an nlp toolkit
☆1,675Jul 6, 2023Updated 3 years ago
beir-cellar / beir
View on GitHub
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
☆2,246Oct 16, 2025Updated 9 months ago
thunlp / ERNIE
View on GitHub
Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"
☆1,419Jan 10, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
shmsw25 / AmbigQA
View on GitHub
An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"
☆123Apr 23, 2022Updated 4 years ago
google-research / electra
View on GitHub
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
☆2,367Mar 23, 2024Updated 2 years ago
princeton-nlp / DensePhrases
View on GitHub
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.o…
☆607Jun 15, 2022Updated 4 years ago
castorini / pyserini
View on GitHub
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
☆2,100Updated this week
microsoft / ANCE
View on GitHub
A novel embedding training algorithm leveraging ANN search and achieved SOTA retrieval on Trec DL 2019 and OpenQA benchmarks
☆385Jan 6, 2026Updated 6 months ago
salesforce / decaNLP
View on GitHub
The Natural Language Decathlon: A Multitask Challenge for NLP
☆2,338May 1, 2025Updated last year
luyug / Condenser
View on GitHub
EMNLP 2021 - Pre-training architectures for dense retrieval
☆256Mar 18, 2022Updated 4 years ago