thu-coai / OpenMEVA
Benchmark for evaluating open-ended generation
☆44Updated last year
Related projects: ⓘ
- TBC☆26Updated last year
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆32Updated 2 years ago
- [COLING22] An End-to-End Library for Evaluating Natural Language Generation☆86Updated 9 months ago
- ☆28Updated last year
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆41Updated last year
- [EMNLP 2022] Code and data for "Controllable Dialogue Simulation with In-Context Learning"☆33Updated last year
- First explanation metric (diagnostic report) for text generation evaluation☆59Updated 2 months ago
- ☆90Updated 5 months ago
- This respository contains the code for extracting the test samples we used in our paper: "A Multitask, Multilingual, Multimodal Evaluatio…☆76Updated 9 months ago
- ☆60Updated last year
- DEMix Layers for Modular Language Modeling☆51Updated 3 years ago
- Code base of In-Context Learning for Dialogue State tracking☆43Updated 11 months ago
- Code for Editing Factual Knowledge in Language Models☆134Updated 2 years ago
- ☆80Updated last year
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.☆25Updated last month
- ☆57Updated 2 years ago
- ☆36Updated 5 months ago
- Code and data for "Retrieval Enhanced Model for Commonsense Generation" (ACL-IJCNLP 2021).☆28Updated 2 years ago
- Official Code for NAACL 2022 paper: "Persona-Guided Planning for Controlling the Protagonist's Persona in Story Generation"☆15Updated 2 years ago
- Code for the paper Code for the paper InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning☆96Updated last year
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".☆64Updated last year
- ☆70Updated 10 months ago
- [NeurIPS 2022] Generating Training Data with Language Models: Towards Zero-Shot Language Understanding☆63Updated 2 years ago
- [ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question Answering☆45Updated 2 years ago
- Code for ACL 2020 paper: USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation (https://arxiv.org/pdf/2005.0045…☆50Updated last year
- ☆70Updated 2 years ago
- Data and code for the paper "Inducing Positive Perspectives with Text Reframing"☆52Updated last year
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆22Updated last month
- ☆60Updated last year
- Lexically constrained text generation with CBART.☆47Updated last year