DavidGrangier/wikipedia-biography-dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DavidGrangier/wikipedia-biography-dataset)

DavidGrangier / wikipedia-biography-dataset

This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).

☆170

Alternatives and similar repositories for wikipedia-biography-dataset

Users that are interested in wikipedia-biography-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tyliupku / wiki2bio
View on GitHub
Code for AAAI2018 paper "Table-to-text Generation by Structure-aware Seq2seq Learning"
☆153Jan 5, 2023Updated 3 years ago
akanimax / natural-language-summary-generation-from-structured-data
View on GitHub
Implementation of the paper -> https://arxiv.org/abs/1709.00155. For converting information present in the form of structured data into n…
☆186Mar 12, 2019Updated 7 years ago
harvardnlp / boxscore-data
View on GitHub
☆115Mar 21, 2022Updated 4 years ago
harvardnlp / data2text
View on GitHub
☆156May 8, 2019Updated 7 years ago
ratishsp / data2text-plan-py
View on GitHub
Code for AAAI 2019 paper on Data-to-Text Generation with Content Selection and Planning
☆161Oct 7, 2021Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
harvardnlp / neural-template-gen
View on GitHub
☆266Jun 9, 2022Updated 4 years ago
uwnlp / neural-checklist
View on GitHub
This repository contains the the code from "Globally Coherent Text Generation with Neural Checklist Models" by Chloe Kiddon, Luke Zettlem…
☆40Mar 1, 2021Updated 5 years ago
msra-nlc / Table2Text
View on GitHub
☆10Apr 16, 2019Updated 7 years ago
ThiagoCF05 / webnlg
View on GitHub
The enriched version of the WebNLG described at INLG 2018
☆71Mar 25, 2021Updated 5 years ago
czyssrs / Few-Shot-NLG
View on GitHub
Code and Data for ACL 2020 paper "Few-Shot NLG with Pre-Trained Language Model"
☆188May 23, 2025Updated last year
Yale-LILY / dart
View on GitHub
Dataset for NAACL 2021 paper: "DART: Open-Domain Structured Data Record to Text Generation"
☆158Nov 21, 2022Updated 3 years ago
njuzrs / dialogue_distillation
View on GitHub
☆15Nov 3, 2022Updated 3 years ago
wenhuchen / GPT2-Logic2Text
View on GitHub
The code for Template-GPT-2 Generation Model for Logic2Text Dataset
☆18Jun 1, 2020Updated 6 years ago
baijiangliang / year2018
View on GitHub
Annual report for programmers.
☆21Jan 3, 2019Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hengyicai / ContrastiveLearning4Dialogue
View on GitHub
The codebase for "Group-wise Contrastive Learning for Neural Dialogue Generation" (Cai et al., Findings of EMNLP 2020)
☆55Feb 24, 2021Updated 5 years ago
suwangcompling / Modeling-Semantic-Plausibility-NAACL18
View on GitHub
Data and all
☆14Sep 30, 2019Updated 6 years ago
ratishsp / mlb-data-scripts
View on GitHub
Scripts to create the MLB dataset introduced in the paper Data-to-text Generation with Entity Modeling
☆14Feb 9, 2021Updated 5 years ago
ratishsp / data2text-1
View on GitHub
☆18May 13, 2021Updated 5 years ago
jifan-chen / QA-Verification-Via-NLI
View on GitHub
Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"
☆24Jul 21, 2023Updated 3 years ago
wenhuchen / LogicNLG
View on GitHub
The data and code for ACL2020 paper "Logical Natural Language Generation from Open-Domain Tables"
☆166Oct 8, 2022Updated 3 years ago
diegma / graph-2-text
View on GitHub
Graph to sequence implemented in Pytorch combining Graph convolutional networks and opennmt-py
☆153Jul 12, 2019Updated 7 years ago
ratishsp / data2text-entity-py
View on GitHub
Code for ACL 2019 paper on Data-to-text Generation with Entity Modeling
☆74Nov 5, 2021Updated 4 years ago
tuetschek / e2e-cleaning
View on GitHub
Cleaned E2E NLG Challenge data + supporting scripts
☆24Jan 19, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
andychisholm / tf-mimo
View on GitHub
Deep learning toolkit for multi-input multi-output sequence modelling with tensorflow
☆18Jan 18, 2018Updated 8 years ago
shawnwun / RNNLG
View on GitHub
RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is rele…
☆490Jul 2, 2019Updated 7 years ago
jiangjiechen / HedModTmplGen
View on GitHub
Code for our ACL 2019 long paper: "Ensuring Readability and Data-fidelity using Head-modifier Templates in Deep Type Description Generati…
☆11Nov 5, 2022Updated 3 years ago
mandarjoshi90 / triviaqa
View on GitHub
Code for the TriviaQA reading comprehension dataset
☆339Apr 5, 2024Updated 2 years ago
nyu-dl / dl4ir-searchQA
View on GitHub
☆181Aug 17, 2018Updated 7 years ago
XiangLi1999 / PosteriorControl-NLG
View on GitHub
Posterior Control of Blackbox Generation
☆23May 2, 2020Updated 6 years ago
facebookresearch / nuanced
View on GitHub
NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.
☆18Aug 24, 2021Updated 4 years ago
google-research-datasets / wiki-reading
View on GitHub
This repository contains the three WikiReading datasets as used and described in WikiReading: A Novel Large-scale Language Understanding …
☆271May 17, 2018Updated 8 years ago
rikdz / GraphWriter
View on GitHub
Code for "Text Generation from Knowledge Graphs with Graph Transformers"
☆525Jul 27, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
accelerated-text / reaction-acc-text-demo
View on GitHub
Integration between Reaction ECommerce and Accelerated Text to provide product descriptions for an e-shop.
☆13Feb 22, 2021Updated 5 years ago
czyssrs / Logic2Text
View on GitHub
Data and code for EMNLP 2020 paper "Logic2Text: High-Fidelity Natural Language Generation from Logical Forms"
☆71Mar 24, 2023Updated 3 years ago
google-research-datasets / ToTTo
View on GitHub
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: giv…
☆465Sep 11, 2024Updated last year
JasonForJoy / FIRE
View on GitHub
EMNLP 2020: Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots
☆12Dec 15, 2020Updated 5 years ago
bdhingra / quasar
View on GitHub
Datasets for Question Answering by Search and Reading
☆70Jan 19, 2018Updated 8 years ago
tatsu-lab / mlm_inductive_bias
View on GitHub
Code Release for "On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies"
☆16Apr 13, 2021Updated 5 years ago
MultiPath / CopyNet
View on GitHub
incorporating copying mechanism in sequence-to-sequence learning
☆180May 25, 2017Updated 9 years ago