google-research-datasets / PropSegmEnt
PropSegmEnt is an annotated dataset for segmenting English text into propositions, and recognizing proposition-level entailment relations - whether a different, related document entails each proposition, contradicts it, or neither. It consists of clusters of closely related documents from the news and Wikipedia domains.
☆18Updated last year
Related projects ⓘ
Alternatives and complementary repositories for PropSegmEnt
- PyTorch code for "FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization" (NAACL 2022)☆38Updated 2 years ago
- ☆33Updated last year
- Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"☆64Updated 3 years ago
- Official code repository for "Exploring Neural Models for Query-Focused Summarization".☆48Updated last year
- The dataset and code for ACL 2022 paper "SciNLI: A Corpus for Natural Language Inference on Scientific Text" are released here.☆25Updated last year
- The corresponding code for our paper: "Exploring the Challenges of Open Domain Multi-Document Summarization". Do not hesitate to open an …☆31Updated last year
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- Dataset, models, and code for paper "CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation", …☆33Updated 2 years ago
- ☆37Updated last year
- ☆67Updated 3 years ago
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20Updated 2 years ago
- This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".☆79Updated 3 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 3 months ago
- ☆14Updated last year
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆81Updated last month
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆72Updated 2 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆31Updated 2 years ago
- Source code for paper "Learning from Noisy Labels for Entity-Centric Information Extraction", EMNLP 2021☆55Updated 2 years ago
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆39Updated 10 months ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆54Updated 2 years ago
- An official repository for MIA 2022 (NAACL 2022 Workshop) Shared Task on Cross-lingual Open-Retrieval Question Answering.☆31Updated 2 years ago
- Corpus exploration platform using advanced tools such as interactive summarization and multi document coreference resolution☆11Updated last year
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Updated 2 years ago
- ☆35Updated last year
- FRANK: Factuality Evaluation Benchmark☆52Updated last year
- Contrastive Fact Verification☆70Updated 2 years ago
- Resources for the shared task on conversational question answering SCAI-QReCC 2021☆27Updated 2 years ago
- SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.☆138Updated 2 years ago
- UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…☆30Updated last year
- ☆51Updated 3 years ago