HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.
☆36Oct 15, 2024Updated last year
Alternatives and similar repositories for hanna-benchmark-asg
Users that are interested in hanna-benchmark-asg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretab…☆20Feb 23, 2025Updated last year
- Benchmark for evaluating open-ended generation☆51Nov 6, 2024Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆49Jan 21, 2025Updated last year
- ☆40Jun 7, 2023Updated 2 years ago
- ☆35Jan 7, 2026Updated 4 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Code for ACL 2020 paper: USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation (https://arxiv.org/pdf/2005.0045…☆50Dec 8, 2022Updated 3 years ago
- Codes for paper "Stylized Story Generation with Style-Guided Planning"☆12May 9, 2021Updated 5 years ago
- Personalized Story Evaluation Model☆18Nov 27, 2023Updated 2 years ago
- What are the best Systems? New Perspectives on NLP Benchmarking☆13Mar 16, 2023Updated 3 years ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆44Jul 19, 2024Updated last year
- NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM☆39Dec 27, 2022Updated 3 years ago
- MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and va…☆12Nov 6, 2023Updated 2 years ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Mar 8, 2023Updated 3 years ago
- ☆23Feb 26, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆18Oct 8, 2024Updated last year
- Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification☆17Jan 8, 2024Updated 2 years ago
- ☆21Jan 15, 2024Updated 2 years ago
- BARTScore: Evaluating Generated Text as Text Generation☆369Jun 27, 2022Updated 3 years ago
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆33Jun 6, 2022Updated 3 years ago
- ☆59Aug 22, 2024Updated last year
- DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery☆20Sep 24, 2025Updated 7 months ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation☆217Feb 10, 2024Updated 2 years ago
- ☆35Jun 12, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Resources for the "SummEval: Re-evaluating Summarization Evaluation" paper☆415Jun 23, 2024Updated last year
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Cleaned up version of the PlotMachines code☆68Jun 12, 2023Updated 2 years ago
- ☆15Apr 16, 2025Updated last year
- ☆10Jan 28, 2024Updated 2 years ago
- Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Textual Style Transfer☆36Oct 2, 2022Updated 3 years ago
- Python 3 support for the MS COCO caption evaluation tools☆14Jun 14, 2024Updated last year
- ☆14Aug 9, 2024Updated last year
- ☆13Oct 28, 2020Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆16Jun 25, 2025Updated 10 months ago
- DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence☆37Jul 25, 2023Updated 2 years ago
- This repository is created for recording the paper I read every day, so as to facilitate my review and push myself to learn.☆13Oct 18, 2020Updated 5 years ago
- Data Valuation on In-Context Examples (ACL23)☆24Jan 12, 2025Updated last year
- ☆38Aug 3, 2022Updated 3 years ago
- Материалы курса "Компьютерная лингвистика и информационные технологии" для 4-го курса бакалавриата направления "Фундаментальная и приклад…☆10Mar 25, 2021Updated 5 years ago
- Deep Learning Seminar -- ÚFAL course NPFL117☆17Nov 22, 2022Updated 3 years ago