PKU-ONELab / ThemisLinks
The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.
β20Updated 3 months ago
Alternatives and similar repositories for Themis
Users that are interested in Themis are comparing it to the libraries listed below
Sorting:
- π©Ί A collection of ChatGPT evaluation reports on various bechmarks.β49Updated 2 years ago
- Collection of papers for scalable automated alignment.β90Updated 7 months ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)β48Updated last year
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)β83Updated 3 months ago
- β53Updated 9 months ago
- Towards Systematic Measurement for Long Text Qualityβ35Updated 9 months ago
- β32Updated last year
- First explanation metric (diagnostic report) for text generation evaluationβ62Updated 3 months ago
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?β56Updated 2 years ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Studyβ43Updated 2 years ago
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).β57Updated 9 months ago
- Token-level Reference-free Hallucination Detectionβ94Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modelingβ50Updated 5 months ago
- This project maintains a reading list for general text generation tasksβ65Updated 3 years ago
- Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"β24Updated last year
- [ACL 2023] Code and Data Repo for Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"β53Updated last year
- β75Updated 5 months ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Modelsβ100Updated last month
- β86Updated 2 years ago
- Do Large Language Models Know What They Donβt Know?β95Updated 6 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.β62Updated 10 months ago
- [ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.β89Updated 2 months ago
- This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"β24Updated last year
- Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extractionβ52Updated last year
- the instructions and demonstrations for building a formal logical reasoning capable GLMβ53Updated 9 months ago
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"β73Updated 2 weeks ago
- Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"β41Updated 7 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"β58Updated last year
- Code and data for the FACTOR paperβ46Updated last year
- On Transferability of Prompt Tuning for Natural Language Processingβ99Updated last year