HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.
☆35Oct 15, 2024Updated last year
Alternatives and similar repositories for hanna-benchmark-asg
Users that are interested in hanna-benchmark-asg are comparing it to the libraries listed below
Sorting:
- Benchmark for evaluating open-ended generation☆51Nov 6, 2024Updated last year
- The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretab…☆20Feb 23, 2025Updated last year
- Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)☆14Oct 3, 2024Updated last year
- ☆39Jun 7, 2023Updated 2 years ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆47Jan 21, 2025Updated last year
- MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and va…☆12Nov 6, 2023Updated 2 years ago
- Codes for paper "Stylized Story Generation with Style-Guided Planning"☆12May 9, 2021Updated 4 years ago
- Code for ACL 2020 paper: USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation (https://arxiv.org/pdf/2005.0045…☆50Dec 8, 2022Updated 3 years ago
- Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification☆16Jan 8, 2024Updated 2 years ago
- What are the best Systems? New Perspectives on NLP Benchmarking☆13Mar 16, 2023Updated 2 years ago
- NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM☆39Dec 27, 2022Updated 3 years ago
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆33Jun 6, 2022Updated 3 years ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆43Jul 19, 2024Updated last year
- ☆18Oct 8, 2024Updated last year
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Mar 8, 2023Updated 3 years ago
- Personalized Story Evaluation Model☆18Nov 27, 2023Updated 2 years ago
- ☆20Jan 15, 2024Updated 2 years ago
- ☆22Feb 26, 2024Updated 2 years ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation☆216Feb 10, 2024Updated 2 years ago
- ☆50Feb 5, 2023Updated 3 years ago
- The source code of the paper 'Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation'☆24Mar 24, 2023Updated 2 years ago
- Resources for the "SummEval: Re-evaluating Summarization Evaluation" paper☆412Jun 23, 2024Updated last year
- ☆60Aug 22, 2024Updated last year
- ☆32Nov 16, 2021Updated 4 years ago
- Code for SIGdial 2020 paper: Unsupervised Evaluation of Interactive Dialog with DialoGPT (https://arxiv.org/abs/2006.12719)☆29Jun 8, 2020Updated 5 years ago
- An Upgraded Fire Emblem Fates Randomizer☆10Jan 28, 2025Updated last year
- ☆71Oct 29, 2021Updated 4 years ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆417Apr 13, 2025Updated 10 months ago
- DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence☆36Jul 25, 2023Updated 2 years ago
- Evaluate the Quality of Critique☆36Jun 1, 2024Updated last year
- ☆35Jun 12, 2022Updated 3 years ago
- ☆10Nov 8, 2022Updated 3 years ago
- ☆34Jul 25, 2024Updated last year
- [ICLR 2021] Contrastive Learning with Adversarial Perturbations for Conditional Text Generation☆86Oct 11, 2022Updated 3 years ago
- ☆10Nov 1, 2022Updated 3 years ago
- Codebase for LLM story generation; updated version of https//github.com/yangkevin2/doc-story-generation☆86Feb 21, 2026Updated 2 weeks ago
- Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Textual Style Transfer☆36Oct 2, 2022Updated 3 years ago
- ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost☆42Nov 15, 2023Updated 2 years ago
- ☆144Sep 10, 2023Updated 2 years ago