google-research-datasets / AISLinks

AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external world that is verifiable in source documents, or "Attributable to Identified Sources".

☆31

Alternatives and similar repositories for AIS

Users that are interested in AIS are comparing it to the libraries listed below

Sorting:

awebson / prompt_semantics
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”
☆85Updated 3 years ago
google-research / dialog-inpainting
☆97Updated 2 years ago
McGill-NLP / FaithDial
☆51Updated 2 years ago
inspired-cognition / critique-apps
Apps built using Inspired Cognition's Critique.
☆58Updated 2 years ago
vipulraheja / iterater
Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)
☆78Updated last year
anthonywchen / MOCHA
Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".
☆16Updated 3 years ago
sunlab-osu / ReasonBERT
Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021
☆29Updated 2 years ago
peterwestuw / surface-form-competition
☆58Updated 3 years ago
google-deepmind / streamingqa
☆48Updated last year
nyu-mll / SQuALITY
Query-focused summarization data
☆42Updated 2 years ago
martiansideofthemoon / longeval-summarization
Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…
☆44Updated 11 months ago
amazon-science / mintaka
Dataset from the paper "Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering" (COLING 2022)
☆114Updated 2 years ago
cambridgeltl / xcopa
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
☆103Updated 4 years ago
allenai / flex
Few-shot NLP benchmark for unified, rigorous eval
☆91Updated 3 years ago
mingdachen / WikiTableT
Code, data, and pretrained models for the paper "Generating Wikipedia Article Sections from Diverse Data Sources"
☆20Updated 4 years ago
nelson-liu / evaluating-verifiability-in-generative-search-engines
Companion repo for "Evaluating Verifiability in Generative Search Engines".
☆83Updated 2 years ago
salesforce / factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"
☆59Updated 6 months ago
yanaiela / pararel
☆45Updated last year
jzbjyb / lm-calibration
☆35Updated 3 years ago
violet-zct / fairseq-detect-hallucination
Detect hallucinated tokens for conditional sequence generation.
☆64Updated 3 years ago
allenai / natural-instructions-v1
Benchmarking Generalization to New Tasks from Natural Language Instructions
☆26Updated 4 years ago
TevenLeScao / pet
This repository contains the code for "How many data points is a prompt worth?"
☆48Updated 4 years ago
martiansideofthemoon / relic-retrieval
Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).
☆20Updated 3 years ago
google-research / true
Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".
☆81Updated 3 weeks ago
tanyuqian / ctc-gen-eval
EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation
☆97Updated 2 years ago
microsoft / HaDes
Token-level Reference-free Hallucination Detection
☆95Updated 2 years ago
machelreid / m2d2
M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer
☆54Updated 2 years ago
oriram / spider
☆54Updated 2 years ago
ryokamoi / wice
This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.
☆41Updated last year
allenai / few_shot_explanations
Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"
☆31Updated 2 years ago