google-research-datasets / xsum_hallucination_annotationsLinks

Faithfulness and factuality annotations of XSum summaries from our paper "On Faithfulness and Factuality in Abstractive Summarization" (https://www.aclweb.org/anthology/2020.acl-main.173.pdf).

☆84

Alternatives and similar repositories for xsum_hallucination_annotations

Users that are interested in xsum_hallucination_annotations are comparing it to the libraries listed below

Sorting:

neulab / REALSumm
REALSumm: Re-evaluating Evaluation in Text Summarization
☆72Updated last month
allenai / contrast-sets
☆59Updated 2 years ago
esdurmus / feqa
Data and code for "A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization" (ACL 2020)
☆49Updated 2 years ago
Yale-LILY / dart
Dataset for NAACL 2021 paper: "DART: Open-Domain Structured Data Record to Text Generation"
☆155Updated 2 years ago
danieldeutsch / sacrerouge
SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.
☆145Updated 3 years ago
artidoro / frank
FRANK: Factuality Evaluation Benchmark
☆59Updated 2 years ago
stangelid / qt
☆44Updated 4 years ago
peterwestuw / surface-form-competition
☆58Updated 3 years ago
shijie-wu / crosslingual-nlp
This repo supports various cross-lingual transfer learning & multilingual NLP models.
☆92Updated 2 years ago
dapascual / K2T
☆71Updated 3 years ago
cambridgeltl / xcopa
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
☆104Updated 4 years ago
tagoyal / factuality-datasets
☆46Updated 2 years ago
jifan-chen / QA-Verification-Via-NLI
Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"
☆25Updated 2 years ago
roeeaharoni / unsupervised-domain-clusters
Code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".
☆58Updated 5 years ago
ThomasScialom / QuestEval
☆100Updated last year
Shikib / usr
Code for ACL 2020 paper: USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation (https://arxiv.org/pdf/2005.0045…
☆50Updated 2 years ago
alontalmor / oLMpics
☆46Updated 5 years ago
danieldeutsch / qaeval
☆15Updated 4 years ago
McGill-NLP / FaithDial
☆50Updated 2 years ago
yangkevin2 / naacl-2021-fudge-controlled-generation
☆100Updated 3 years ago
tagoyal / dae-factuality
☆28Updated 2 years ago
jayded / eraserbenchmark
A benchmark for understanding and evaluating rationales: http://www.eraserbenchmark.com/
☆99Updated 2 years ago
yanaiela / amnesic_probing
☆39Updated 4 years ago
W4ngatang / qags
Question Answering and Generation for Summarization
☆71Updated 2 years ago
dykang / xslue
ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy
☆15Updated 4 years ago
tommccoy1 / hans
Heuristic Analysis for NLI Systems
☆127Updated 4 years ago
lvyiwei1 / StylePTB
☆63Updated 2 years ago
alisawuffles / DExperts
code associated with ACL 2021 DExperts paper
☆117Updated 2 years ago
AIPHES / emnlp19-moverscore
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
☆209Updated last year
facebookresearch / QA-Overlap
Code to support the paper "Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets"
☆66Updated 4 years ago