google-research-datasets/ToTTo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/ToTTo)

google-research-datasets / ToTTo

ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. We hope it can serve as a useful research benchmark for high-precision conditional text generation.

☆465

Alternatives and similar repositories for ToTTo

Users that are interested in ToTTo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wenhuchen / LogicNLG
View on GitHub
The data and code for ACL2020 paper "Logical Natural Language Generation from Open-Domain Tables"
☆166Oct 8, 2022Updated 3 years ago
Yale-LILY / dart
View on GitHub
Dataset for NAACL 2021 paper: "DART: Open-Domain Structured Data Record to Text Generation"
☆158Nov 21, 2022Updated 3 years ago
yxuansu / PlanGen
View on GitHub
[EMNLP'21] Plan-then-Generate: Controlled Data-to-Text Generation via Planning
☆76Jun 15, 2022Updated 4 years ago
wenhuchen / Table-Fact-Checking
View on GitHub
Data and Code for ICLR2020 Paper "TabFact: A Large-scale Dataset for Table-based Fact Verification"
☆416Sep 19, 2023Updated 2 years ago
google-research / tapas
View on GitHub
End-to-end neural table-text understanding models.
☆1,204Jul 22, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
google-research / language
View on GitHub
Shared repository for open-sourced projects from the Google AI Language team.
☆1,787Jun 10, 2026Updated last month
tuetschek / e2e-cleaning
View on GitHub
Cleaned E2E NLG Challenge data + supporting scripts
☆24Jan 19, 2021Updated 5 years ago
KaijuML / data-to-text-hierarchical
View on GitHub
Code for A Hierarchical Model for Data-to-Text Generation (Rebuffel, Soulier, Scoutheeten, Gallinari; ECIR 2020)
☆81Dec 4, 2023Updated 2 years ago
wenhuchen / HybridQA
View on GitHub
Dataset and code for EMNLP2020 paper "HybridQA: A Dataset of Multi-Hop Question Answeringover Tabular and Textual Data"
☆247Jun 3, 2023Updated 3 years ago
zhongwanjun / CARP
View on GitHub
code for the table-based open domain question answering project, with paper title: "Reasoning over Hybrid Chain for Table-and-Text Open D…
☆12Sep 16, 2022Updated 3 years ago
Yale-LILY / FeTaQA
View on GitHub
Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"
☆90May 11, 2023Updated 3 years ago
czyssrs / Logic2Text
View on GitHub
Data and code for EMNLP 2020 paper "Logic2Text: High-Fidelity Natural Language Generation from Logical Forms"
☆71Mar 24, 2023Updated 3 years ago
microsoft / HiTab
View on GitHub
[ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.
☆109Dec 16, 2025Updated 7 months ago
tuetschek / e2e-metrics
View on GitHub
E2E NLG Challenge Evaluation metrics
☆93Aug 17, 2020Updated 5 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / KILT
View on GitHub
Library for Knowledge Intensive Language Tasks
☆978Mar 31, 2022Updated 4 years ago
luka-group / Lattice
View on GitHub
[NAACL 2022] Robust (Controlled) Table-to-Text Generation with Structure-Aware Equivariance Learning.
☆56Apr 1, 2024Updated 2 years ago
czyssrs / Few-Shot-NLG
View on GitHub
Code and Data for ACL 2020 paper "Few-Shot NLG with Pre-Trained Language Model"
☆188May 23, 2025Updated last year
facebookresearch / TaBERT
View on GitHub
This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural lan…
☆613Aug 26, 2021Updated 4 years ago
wenhuchen / KGPT
View on GitHub
Code and Data for EMNLP2020 Paper "KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation"
☆147Jun 6, 2021Updated 5 years ago
facebookresearch / SentAugment
View on GitHub
SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…
☆359Feb 22, 2022Updated 4 years ago
facebookresearch / anli
View on GitHub
Adversarial Natural Language Inference Benchmark
☆402May 12, 2022Updated 4 years ago
wenhuchen / OTT-QA
View on GitHub
Code and Data for ICLR2021 Paper "Open Question Answering over Tables and Text"
☆163Jan 2, 2024Updated 2 years ago
aistairc / rotowire-modified
View on GitHub
Script for generating the rotowire-modified dataset (Iso et al; ACL 2019)
☆12Sep 19, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / Table-Pretraining
View on GitHub
ICLR 2022 Paper, SOTA Table Pre-training Model, TAPEX: Table Pre-training via Learning a Neural SQL Executor
☆300Feb 6, 2023Updated 3 years ago
KaijuML / dtt-multi-branch
View on GitHub
Code for Controlling Hallucinations at Word Level in Data-to-Text Generation (C. Rebuffel, M. Roberti, L. Soulier, G. Scoutheeten, R. Can…
☆16Jun 12, 2023Updated 3 years ago
pengbaolin / SC-GPT
View on GitHub
Few-shot Natural Language Generation for Task-Oriented Dialog
☆190Nov 21, 2022Updated 3 years ago
google-research / bleurt
View on GitHub
BLEURT is a metric for Natural Language Generation based on transfer learning.
☆794Aug 4, 2023Updated 2 years ago
salesforce / factCC
View on GitHub
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper
☆305May 1, 2025Updated last year
AIPHES / emnlp19-moverscore
View on GitHub
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
☆213Nov 20, 2023Updated 2 years ago
princeton-nlp / DensePhrases
View on GitHub
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.o…
☆607Jun 15, 2022Updated 4 years ago
allenai / allennlp-semparse
View on GitHub
A framework for building semantic parsers (including neural module networks) with AllenNLP, built by the authors of AllenNLP
☆109Apr 8, 2022Updated 4 years ago
taoyds / grappa
View on GitHub
☆31Sep 4, 2021Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
INK-USC / CommonGen
View on GitHub
A Constrained Text Generation Challenge Towards Generative Commonsense Reasoning
☆142Jan 5, 2024Updated 2 years ago
Tiiiger / bert_score
View on GitHub
BERT score for text generation
☆1,907Jul 30, 2024Updated last year
harvardnlp / boxscore-data
View on GitHub
☆115Mar 21, 2022Updated 4 years ago
tzshi / squall
View on GitHub
Data and Code Release for "On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries"
☆55Nov 9, 2020Updated 5 years ago
XinyuanLu00 / SciTab
View on GitHub
The project page for "SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables"
☆23Dec 21, 2023Updated 2 years ago
anjbapat / D2T
View on GitHub
Text generation from structured data
☆10Dec 2, 2019Updated 6 years ago
facebookresearch / multihop_dense_retrieval
View on GitHub
Multi-hop dense retrieval for question answering
☆217Oct 12, 2021Updated 4 years ago