thu-coai / OpenMEVA
Benchmark for evaluating open-ended generation
☆48Updated 3 months ago
Alternatives and similar repositories for OpenMEVA:
Users that are interested in OpenMEVA are comparing it to the libraries listed below
- UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation☆58Updated 4 years ago
- ☆31Updated last year
- ☆90Updated 10 months ago
- First explanation metric (diagnostic report) for text generation evaluation☆63Updated 7 months ago
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆31Updated 2 years ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Updated last year
- [EMNLP 2022] Code and data for "Controllable Dialogue Simulation with In-Context Learning"☆34Updated last year
- Official code for "Continual Prompt Tuning for Dialog State Tracking" (ACL 2022).☆27Updated last year
- Code and data for "Retrieval Enhanced Model for Commonsense Generation" (ACL-IJCNLP 2021).☆28Updated 3 years ago
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.☆31Updated 4 months ago
- TBC☆26Updated 2 years ago
- [ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question Answering☆45Updated 2 years ago
- EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation☆96Updated last year
- Code for the paper Code for the paper InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning☆99Updated last year
- Hierarchical Sketch Induction for Paraphrase Generation (Hosking et al., ACL 2022)☆51Updated last year
- ☆61Updated 2 years ago
- Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Textual Style Transfer☆34Updated 2 years ago
- Code base of In-Context Learning for Dialogue State tracking☆45Updated last year
- Code for Aesop: Paraphrase Generation with Adaptive Syntactic Control (EMNLP 2021)☆27Updated 3 years ago
- This project maintains a reading list for general text generation tasks☆65Updated 3 years ago
- The Official Repository for the Automatic Dialogue Evaluation Sub-task of DSTC10 Track 5 (Automatic Evaluation and Moderation of Open-dom…☆19Updated 3 years ago
- We construct and introduce DIALFACT, a testing benchmark dataset crowd-annotated conversational claims, paired with pieces of evidence fr…☆41Updated 2 years ago
- ☆36Updated 10 months ago
- ☆82Updated last year
- EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975☆37Updated last year
- 🐥 Code and Dataset for our EMNLP 2022 paper - "ProsocialDialog: A Prosocial Backbone for Conversational Agents"☆61Updated last year
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".☆66Updated 2 years ago
- ☆37Updated last year
- SummScreen: A Dataset for Abstractive Screenplay Summarization (ACL 2022)☆35Updated 2 years ago
- Resources for paper "DialSummEval: Revisiting summarization evaluation for dialogues"☆14Updated 2 years ago