google-research-datasets / AIS
AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external world that is verifiable in source documents, or "Attributable to Identified Sources".
☆30Updated last year
Related projects ⓘ
Alternatives and complementary repositories for AIS
- Resources for the shared task on conversational question answering SCAI-QReCC 2021☆27Updated 2 years ago
- Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"☆31Updated last year
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 3 months ago
- Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021☆29Updated last year
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".☆16Updated 2 years ago
- ☆35Updated last year
- This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”☆84Updated 2 years ago
- Code, data, and pretrained models for the paper "Generating Wikipedia Article Sections from Diverse Data Sources"☆19Updated 3 years ago
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://arxiv.org/abs/2406.13663☆16Updated last week
- Contrastive Fact Verification☆70Updated 2 years ago
- Query-focused summarization data☆41Updated last year
- Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)☆76Updated last year
- ☆48Updated last year
- ☆55Updated last year
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆40Updated 11 months ago
- ☆33Updated last year
- Detect hallucinated tokens for conditional sequence generation.☆63Updated 2 years ago
- ☆97Updated 2 years ago
- ☆37Updated last week
- ☆15Updated 3 years ago
- ☆58Updated 2 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆30Updated 4 years ago
- PyTorch code for "FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization" (NAACL 2022)☆38Updated 2 years ago
- ☆31Updated last year
- Code for paper "Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?"☆20Updated 4 years ago
- Apps built using Inspired Cognition's Critique.☆58Updated last year
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆72Updated 2 years ago
- ☆19Updated 3 years ago
- Hercules: Attributable and Scalable Opinion Summarization (ACL 2023)☆18Updated last year
- ☆44Updated last year