jinlanfu/GPTScore

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jinlanfu/GPTScore)

jinlanfu / GPTScore

Source Code of Paper "GPTScore: Evaluate as You Desire"

☆258

Alternatives and similar repositories for GPTScore

Users that are interested in GPTScore are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nlpyang / geval
View on GitHub
Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
☆430Feb 4, 2024Updated 2 years ago
krystalan / chatgpt_as_nlg_evaluator
View on GitHub
Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study
☆43Mar 8, 2023Updated 3 years ago
maszhongming / UniEval
View on GitHub
Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation
☆217Feb 10, 2024Updated 2 years ago
Yale-LILY / ROSE
View on GitHub
☆41Jun 7, 2023Updated 3 years ago
google / BEGIN-dataset
View on GitHub
A benchmark dataset for evaluating dialog system and natural language generation metrics.
☆39Jun 13, 2022Updated 4 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
microsoft / iclr2019-learning-to-represent-edits
View on GitHub
Code for the ICLR 2019 paper "Learning to Represent Edits"
☆13Dec 8, 2022Updated 3 years ago
kaistAI / FLASK
View on GitHub
[ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
☆218Dec 24, 2023Updated 2 years ago
shmsw25 / FActScore
View on GitHub
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…
☆450Apr 13, 2025Updated last year
exe1023 / DialEvalMetrics
View on GitHub
☆62Oct 30, 2022Updated 3 years ago
neulab / BARTScore
View on GitHub
BARTScore: Evaluating Generated Text as Text Generation
☆368Jun 27, 2022Updated 4 years ago
psunlpgroup / MACSum
View on GitHub
Dataset, metrics, and models for TACL 2023 paper MACSUM: Controllable Summarization with Mixed Attributes.
☆34Jul 25, 2023Updated 3 years ago
potsawee / selfcheckgpt
View on GitHub
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
☆628Jun 26, 2024Updated 2 years ago
i-Eval / FairEval
View on GitHub
☆145Sep 10, 2023Updated 2 years ago
yuh-zha / AlignScore
View on GitHub
ACL2023 - AlignScore, a metric for factual consistency evaluation.
☆164Mar 11, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
tingofurro / summac
View on GitHub
Codebase, data and models for the SummaC paper in TACL
☆110Jan 30, 2025Updated last year
thu-coai / CTRLEval
View on GitHub
Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)
☆33Jun 6, 2022Updated 4 years ago
kite99520 / DialSummEval
View on GitHub
Resources for paper "DialSummEval: Revisiting summarization evaluation for dialogues"
☆14Jul 22, 2025Updated last year
krafton-ai / MPC
View on GitHub
The git repository of Modular Prompted Chatbot paper
☆35May 24, 2023Updated 3 years ago
MilkWhite / LLMs_for_Reference_Free_Text_Quality_Evaluation
View on GitHub
☆11Apr 13, 2023Updated 3 years ago
txsun1997 / Metric-Fairness
View on GitHub
EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation
☆41Oct 19, 2022Updated 3 years ago
ruixiangcui / AGIEval
View on GitHub
☆774Jun 13, 2024Updated 2 years ago
d223302 / A-Closer-Look-To-LLM-Evaluation
View on GitHub
Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"
☆19Oct 9, 2023Updated 2 years ago
FranxYao / Complexity-Based-Prompting
View on GitHub
Complexity Based Prompting for Multi-Step Reasoning
☆17Mar 10, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
CriticBench / CriticBench
View on GitHub
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆31Mar 5, 2024Updated 2 years ago
jeffhj / LM-reasoning
View on GitHub
This repository contains a collection of papers and resources on Reasoning in Large Language Models.
☆572Nov 13, 2023Updated 2 years ago
nlp-waseda / mtl-eadrg
View on GitHub
Emotion-Aware Dialogue Response Generation by Multi-Task Learning
☆13Jan 22, 2022Updated 4 years ago
THU-KEG / EvaluationPapers4ChatGPT
View on GitHub
Resource, Evaluation and Detection Papers for ChatGPT
☆456Mar 21, 2024Updated 2 years ago
Tiiiger / bert_score
View on GitHub
BERT score for text generation
☆1,909Jul 30, 2024Updated last year
Timothyxxx / RetrivalLMPapers
View on GitHub
Paper collections of retrieval-based (augmented) language model.
☆233May 24, 2024Updated 2 years ago
allenai / FineGrainedRLHF
View on GitHub
☆283Jan 6, 2025Updated last year
xiangyue9607 / QVE
View on GitHub
Code for the ACL2022 paper "Synthetic Question Value Estimation for Domain Adaptation of Question Answering"
☆18Mar 21, 2022Updated 4 years ago
passing2961 / PersonaChatGen
View on GitHub
🎭 Official code and dataset for our CCGPK@COLING 2022 paper - "PersonaChatGen: Generating Personalized Dialogue using GPT-3"
☆13Mar 26, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Alsace08 / SumCoT
View on GitHub
[ACL 2023] Code and Data Repo for Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"
☆54Jan 21, 2024Updated 2 years ago
tzshi / squall
View on GitHub
Data and Code Release for "On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries"
☆55Nov 9, 2020Updated 5 years ago
princeton-nlp / LLMBar
View on GitHub
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆138Jul 8, 2024Updated 2 years ago
ffaltings / InteractiveTextGeneration
View on GitHub
☆34Mar 25, 2023Updated 3 years ago
JeremyAlain / imitation_learning_from_language_feedback
View on GitHub
This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"
☆26Mar 30, 2023Updated 3 years ago
Timothyxxx / Chain-of-ThoughtsPapers
View on GitHub
A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".
☆2,105Oct 5, 2023Updated 2 years ago
luohongyin / UniLC
View on GitHub
Interpretable unified language safety checking with large language models
☆32Apr 15, 2023Updated 3 years ago