PKU-ONELab / ThemisLinks

The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.

☆20

Alternatives and similar repositories for Themis

Users that are interested in Themis are comparing it to the libraries listed below

Sorting:

Spico197 / awesome-lm-evaluation
🩺 A collection of ChatGPT evaluation reports on various bechmarks.
☆49Updated 2 years ago
icip-cas / awesome-auto-alignment
Collection of papers for scalable automated alignment.
☆90Updated 7 months ago
Abbey4799 / CELLO
Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)
☆48Updated last year
thu-coai / ComplexBench
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆83Updated 3 months ago
qinyiwei / InfoBench
☆53Updated 9 months ago
OpenLMLab / LongWanjuan
Towards Systematic Measurement for Long Text Quality
☆35Updated 9 months ago
DAMO-NLP-SG / TempReason
☆32Updated last year
xu1998hz / InstructScore_SEScore3
First explanation metric (diagnostic report) for text generation evaluation
☆62Updated 3 months ago
GanjinZero / math401-llm
Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?
☆56Updated 2 years ago
krystalan / chatgpt_as_nlg_evaluator
Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study
☆43Updated 2 years ago
ntunlp / LLMSanitize
An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).
☆57Updated 9 months ago
microsoft / HaDes
Token-level Reference-free Hallucination Detection
☆94Updated last year
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆50Updated 5 months ago
lemaoliu / retrieval-generation-reading-list
This project maintains a reading list for general text generation tasks
☆65Updated 3 years ago
PlusLabNLP / Active-IT
Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"
☆24Updated last year
Alsace08 / SumCoT
[ACL 2023] Code and Data Repo for Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"
☆53Updated last year
RUCAIBox / Language-Specific-Neurons
☆75Updated 5 months ago
YJiangcm / FollowBench
[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
☆100Updated last month
sunlab-osu / Understanding-CoT
☆86Updated 2 years ago
yinzhangyue / SelfAware
Do Large Language Models Know What They Don’t Know?
☆95Updated 6 months ago
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆62Updated 10 months ago
microsoft / HiTab
[ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.
☆89Updated 2 months ago
qiancheng0 / CREATOR
This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"
☆24Updated last year
yzjiao / On-Demand-IE
Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction
☆52Updated last year
csitfun / LogiCoT
the instructions and demonstrations for building a formal logical reasoning capable GLM
☆53Updated 9 months ago
chujiezheng / LLM-Extrapolation
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆73Updated 2 weeks ago
KwanWaiChung / MT-Eval
Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"
☆41Updated 7 months ago
wwxu21 / CUT
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Updated last year
AI21Labs / factor
Code and data for the FACTOR paper
☆46Updated last year
thunlp / Prompt-Transferability
On Transferability of Prompt Tuning for Natural Language Processing
☆99Updated last year