PKU-ONELab / ThemisLinks
The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.
β20Updated 7 months ago
Alternatives and similar repositories for Themis
Users that are interested in Themis are comparing it to the libraries listed below
Sorting:
- π©Ί A collection of ChatGPT evaluation reports on various bechmarks.β50Updated 2 years ago
- β56Updated last year
- Collection of papers for scalable automated alignment.β93Updated 11 months ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)β49Updated last year
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Modelsβ114Updated 4 months ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)β95Updated 7 months ago
- Do Large Language Models Know What They Donβt Know?β99Updated 11 months ago
- EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Explorationβ36Updated last year
- ACL2023 - AlignScore, a metric for factual consistency evaluation.β138Updated last year
- Code and data for the FACTOR paperβ52Updated last year
- π An unofficial implementation of Self-Alignment with Instruction Backtranslation.β139Updated 5 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Followingβ131Updated last year
- Token-level Reference-free Hallucination Detectionβ96Updated 2 years ago
- paper list on reasoning in NLPβ192Updated 6 months ago
- β85Updated 9 months ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generationβ211Updated last year
- an easy-to-use knn-mt toolkitβ104Updated 2 years ago
- β75Updated last year
- Official Implementation of "Probing Language Models for Pre-training Data Detection"β20Updated 10 months ago
- Code for the paper "Open Domain Question Answering with A Unified Knowledge Interface" (ACL 2022)β56Updated 2 years ago
- Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"β59Updated 2 years ago
- First explanation metric (diagnostic report) for text generation evaluationβ62Updated 7 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don'tβ¦β121Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).β56Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)β59Updated last year
- Repository for Decomposed Promptingβ95Updated last year
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.β51Updated 2 years ago
- A retrieval augmented sequence modeling toolkit implemented based on Fairseqβ29Updated 2 years ago
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.β35Updated last year
- Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation (EMNLP 2023)β30Updated last year