google-research-datasets / ToTTo
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. We hope it can serve as a useful research benchmark for high-precision conditional text generation.
☆446Updated 7 months ago
Alternatives and similar repositories for ToTTo:
Users that are interested in ToTTo are comparing it to the libraries listed below
- TyDi QA contains 200k human-annotated question-answer pairs in 11 Typologically Diverse languages, written without seeing the answer and …☆305Updated 4 years ago
- [ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.o…☆604Updated 2 years ago
- UnifiedQA: Crossing Format Boundaries With a Single QA System☆435Updated 2 years ago
- Scripts and links to recreate the ELI5 dataset.☆325Updated 3 years ago
- ☆345Updated 3 years ago
- Officially supported AllenNLP models☆541Updated 2 years ago
- MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension and question answerin…☆216Updated last year
- Adversarial Natural Language Inference Benchmark☆393Updated 2 years ago
- Library for Knowledge Intensive Language Tasks☆939Updated 3 years ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆557Updated 3 years ago
- New dataset☆304Updated 3 years ago
- NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations☆783Updated 11 months ago
- Resources for the "SummEval: Re-evaluating Summarization Evaluation" paper☆391Updated 10 months ago
- DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue☆283Updated last year
- An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)☆445Updated 2 months ago
- KnowBert -- Knowledge Enhanced Contextual Word Representations☆376Updated 4 years ago
- Code and data to support the paper "PAQ 65 Million Probably-Asked Questions andWhat You Can Do With Them"☆202Updated 3 years ago
- This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural lan…☆598Updated 3 years ago
- Semantics-aware BERT for Language Understanding (AAAI 2020)☆287Updated 2 years ago
- Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive…☆431Updated 2 years ago
- Data and Code for ICLR2020 Paper "TabFact: A Large-scale Dataset for Table-based Fact Verification"☆395Updated last year
- ☆221Updated last year
- Autoregressive Entity Retrieval☆786Updated last year
- a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini☆350Updated last year
- Dataset and code for EMNLP2020 paper "HybridQA: A Dataset of Multi-Hop Question Answeringover Tabular and Textual Data"☆228Updated last year
- Code associated with the Don't Stop Pretraining ACL 2020 paper☆530Updated 3 years ago
- Interpretable Evaluation for (Almost) All NLP Tasks☆195Updated 2 years ago
- MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance☆204Updated last year
- Pre-Trained Models for ToD-BERT☆292Updated last year
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆362Updated 3 years ago