Source Code of Paper "GPTScore: Evaluate as You Desire"
☆259Feb 21, 2023Updated 3 years ago
Alternatives and similar repositories for GPTScore
Users that are interested in GPTScore are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"☆411Feb 4, 2024Updated 2 years ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Mar 8, 2023Updated 3 years ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation☆216Feb 10, 2024Updated 2 years ago
- ☆40Jun 7, 2023Updated 2 years ago
- A benchmark dataset for evaluating dialog system and natural language generation metrics.☆39Jun 13, 2022Updated 3 years ago
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆217Dec 24, 2023Updated 2 years ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆423Apr 13, 2025Updated 11 months ago
- ☆63Oct 30, 2022Updated 3 years ago
- BARTScore: Evaluating Generated Text as Text Generation☆367Jun 27, 2022Updated 3 years ago
- Dataset, metrics, and models for TACL 2023 paper MACSUM: Controllable Summarization with Mixed Attributes.☆34Jul 25, 2023Updated 2 years ago
- Code for the ICLR 2019 paper "Learning to Represent Edits"☆13Dec 8, 2022Updated 3 years ago
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆606Jun 26, 2024Updated last year
- ☆144Sep 10, 2023Updated 2 years ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆155Mar 11, 2024Updated 2 years ago
- Codebase, data and models for the SummaC paper in TACL☆108Jan 30, 2025Updated last year
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆33Jun 6, 2022Updated 3 years ago
- Resources for paper "DialSummEval: Revisiting summarization evaluation for dialogues"☆15Jul 22, 2025Updated 8 months ago
- ☆11Apr 13, 2023Updated 2 years ago
- The git repository of Modular Prompted Chatbot paper☆35May 24, 2023Updated 2 years ago
- EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation☆41Oct 19, 2022Updated 3 years ago
- ☆771Jun 13, 2024Updated last year
- Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"☆19Oct 9, 2023Updated 2 years ago
- Emotion-Aware Dialogue Response Generation by Multi-Task Learning☆13Jan 22, 2022Updated 4 years ago
- Complexity Based Prompting for Multi-Step Reasoning☆17Mar 10, 2023Updated 3 years ago
- BERT score for text generation☆1,882Jul 30, 2024Updated last year
- This repository contains a collection of papers and resources on Reasoning in Large Language Models.☆569Nov 13, 2023Updated 2 years ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated 2 years ago
- Resource, Evaluation and Detection Papers for ChatGPT☆455Mar 21, 2024Updated 2 years ago
- Paper collections of retrieval-based (augmented) language model.☆232May 24, 2024Updated last year
- ☆282Jan 6, 2025Updated last year
- Code for the ACL2022 paper "Synthetic Question Value Estimation for Domain Adaptation of Question Answering"☆17Mar 21, 2022Updated 4 years ago
- 🎭 Official code and dataset for our CCGPK@COLING 2022 paper - "PersonaChatGen: Generating Personalized Dialogue using GPT-3"☆13Mar 26, 2024Updated last year
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆134Oct 25, 2023Updated 2 years ago
- ☆34Mar 25, 2023Updated 2 years ago
- Data and Code Release for "On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries"☆55Nov 9, 2020Updated 5 years ago
- Interpretable unified language safety checking with large language models☆32Apr 15, 2023Updated 2 years ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆137Jul 8, 2024Updated last year
- Question Answering and Generation for Summarization☆72Nov 27, 2022Updated 3 years ago
- ☆22Feb 26, 2024Updated 2 years ago