HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.
☆37Oct 15, 2024Updated last year
Alternatives and similar repositories for hanna-benchmark-asg
Users that are interested in hanna-benchmark-asg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments (Zhou et al., EMNLP 2024)☆14Oct 3, 2024Updated last year
- The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretab…☆20Feb 23, 2025Updated last year
- Benchmark for evaluating open-ended generation☆50Nov 6, 2024Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆49Jan 21, 2025Updated last year
- ☆40Jun 7, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆34Jan 7, 2026Updated 5 months ago
- Code for ACL 2020 paper: USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation (https://arxiv.org/pdf/2005.0045…☆50Dec 8, 2022Updated 3 years ago
- Personalized Story Evaluation Model☆17Nov 27, 2023Updated 2 years ago
- What are the best Systems? New Perspectives on NLP Benchmarking