Source Code of Paper "GPTScore: Evaluate as You Desire"
☆258Feb 21, 2023Updated 3 years ago
Alternatives and similar repositories for GPTScore
Users that are interested in GPTScore are comparing it to the libraries listed below
Sorting:
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Mar 8, 2023Updated 2 years ago
- Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"☆409Feb 4, 2024Updated 2 years ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation☆215Feb 10, 2024Updated 2 years ago
- A benchmark dataset for evaluating dialog system and natural language generation metrics.☆39Jun 13, 2022Updated 3 years ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆415Apr 13, 2025Updated 10 months ago
- ☆39Jun 7, 2023Updated 2 years ago
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆217Dec 24, 2023Updated 2 years ago
- Code for the ICLR 2019 paper "Learning to Represent Edits"☆13Dec 8, 2022Updated 3 years ago
- ☆62Oct 30, 2022Updated 3 years ago
- BARTScore: Evaluating Generated Text as Text Generation☆367Jun 27, 2022Updated 3 years ago
- Dataset, metrics, and models for TACL 2023 paper MACSUM: Controllable Summarization with Mixed Attributes.☆34Jul 25, 2023Updated 2 years ago
- EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation☆41Oct 19, 2022Updated 3 years ago
- Resources for paper "DialSummEval: Revisiting summarization evaluation for dialogues"☆15Jul 22, 2025Updated 7 months ago
- ☆282Jan 6, 2025Updated last year
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated last year
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆152Mar 11, 2024Updated last year
- The git repository of Modular Prompted Chatbot paper☆35May 24, 2023Updated 2 years ago
- Codebase, data and models for the SummaC paper in TACL☆108Jan 30, 2025Updated last year
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆602Jun 26, 2024Updated last year
- ☆144Sep 10, 2023Updated 2 years ago
- ☆772Jun 13, 2024Updated last year
- Complexity Based Prompting for Multi-Step Reasoning☆17Mar 10, 2023Updated 2 years ago
- BERT score for text generation☆1,876Jul 30, 2024Updated last year
- Resource, Evaluation and Detection Papers for ChatGPT☆456Mar 21, 2024Updated last year
- 🎭 Official code and dataset for our CCGPK@COLING 2022 paper - "PersonaChatGen: Generating Personalized Dialogue using GPT-3"☆13Mar 26, 2024Updated last year
- ☆34Mar 25, 2023Updated 2 years ago
- ICML'2022: Black-Box Tuning for Language-Model-as-a-Service & EMNLP'2022: BBTv2: Towards a Gradient-Free Future with Large Language Model…☆271Nov 8, 2022Updated 3 years ago
- Paper collections of retrieval-based (augmented) language model.☆232May 24, 2024Updated last year
- ☆250Dec 21, 2022Updated 3 years ago
- This repository contains a collection of papers and resources on Reasoning in Large Language Models.☆567Nov 13, 2023Updated 2 years ago
- A Structured Span Selector (NAACL 2022). A structured span selector with a WCFG for span selection tasks (coreference resolution, semanti…☆21Jul 11, 2022Updated 3 years ago
- ☆22Feb 26, 2024Updated 2 years ago
- Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"☆19Oct 9, 2023Updated 2 years ago
- Expanding natural instructions☆1,035Dec 11, 2023Updated 2 years ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆137Jul 8, 2024Updated last year
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,818Jun 17, 2025Updated 8 months ago
- ☆11Apr 13, 2023Updated 2 years ago
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆23Mar 4, 2025Updated 11 months ago
- MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and va…☆12Nov 6, 2023Updated 2 years ago