google-research-datasets / natural-questions
Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answering systems.
☆940Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for natural-questions
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆631Updated last year
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆555Updated 2 years ago
- Library for Knowledge Intensive Language Tasks☆916Updated 2 years ago
- BLEURT is a metric for Natural Language Generation based on transfer learning.☆697Updated last year
- Officially supported AllenNLP models☆528Updated last year
- Shared repository for open-sourced projects from the Google AI Language team.☆1,628Updated 3 weeks ago
- Scripts and links to recreate the ELI5 dataset.☆319Updated 3 years ago
- Code for using and evaluating SpanBERT.☆891Updated last year
- jiant is an nlp toolkit☆1,647Updated last year
- A full Python Implementation of the ROUGE Metric (not a wrapper)☆670Updated this week
- ACL2020 Tutorial: Open-Domain Question Answering☆837Updated 3 years ago
- ☆451Updated 3 years ago
- [DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations☆773Updated 3 years ago
- The Schema-Guided Dialogue Dataset☆549Updated last year
- BERT for Coreference Resolution☆445Updated last year
- Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.☆1,132Updated 9 months ago
- High-accuracy NLP parser with models for 11 languages.☆871Updated 2 years ago
- A summary of must-read papers for Neural Question Generation (NQG)☆584Updated 3 years ago
- ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: giv…☆436Updated 2 months ago
- Evaluating Cross-lingual Sentence Representations☆442Updated 3 years ago
- Resources for the NAACL 2018 paper "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents"☆357Updated last year
- UnifiedQA: Crossing Format Boundaries With a Single QA System☆428Updated 2 years ago
- [ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723☆721Updated 2 years ago
- 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy☆1,352Updated 5 months ago
- A python tool for evaluating the quality of sentence embeddings.☆2,087Updated 8 months ago
- A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)☆1,097Updated 2 months ago
- Code to obtain the CNN / Daily Mail dataset (non-anonymized) for summarization☆635Updated 2 years ago
- [ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.o…☆605Updated 2 years ago
- Code for the TriviaQA reading comprehension dataset☆292Updated 7 months ago
- A tool for holistic analysis of language generations systems☆467Updated 2 years ago