kstats / CIMALinks
☆24Updated 4 years ago
Alternatives and similar repositories for CIMA
Users that are interested in CIMA are comparing it to the libraries listed below
Sorting:
- Codebase, data and models for the SummaC paper in TACL☆107Updated 11 months ago
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆125Updated 3 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆107Updated last year
- GEMBA — GPT Estimation Metric Based Assessment☆141Updated last month
- ☆102Updated last year
- a tool for calcualting character n-gram F score☆76Updated 2 years ago
- Faithfulness and factuality annotations of XSum summaries from our paper "On Faithfulness and Factuality in Abstractive Summarization" (h…☆84Updated 5 years ago
- ☆98Updated 4 months ago
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆187Updated 2 years ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆57Updated 3 years ago
- A Multilingual Replicable Instruction-Following Model☆95Updated 2 years ago
- Detect hallucinated tokens for conditional sequence generation.☆64Updated 3 years ago
- Official implementation of the paper "IteraTeR: Understanding Iterative Revision from Human-Written Text" (ACL 2022)☆80Updated 2 years ago
- Dataset for NAACL 2021 paper: "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization"☆141Updated 2 years ago
- XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning☆104Updated 4 years ago
- Find informative examples to efficiently (human)-evaluate NLG models.☆17Updated last week
- ☆102Updated 3 years ago
- ☆71Updated 4 years ago
- ☆83Updated 2 years ago
- FRANK: Factuality Evaluation Benchmark☆59Updated 3 years ago
- Data for evaluating gender bias in coreference resolution systems.☆81Updated 6 years ago
- ☆29Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆133Updated last year
- ☆97Updated 3 years ago
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆82Updated last week
- MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization☆80Updated 4 years ago
- Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper☆308Updated 8 months ago
- ☆17Updated 7 months ago
- A library of translation-based text similarity measures☆25Updated 2 years ago
- Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric …☆79Updated 2 years ago