PKU-ONELab / Themis
The official repository for our EMNLP 2024 paper Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.
โ18Updated 2 months ago
Alternatives and similar repositories for Themis:
Users that are interested in Themis are comparing it to the libraries listed below
- ๐ฉบ A collection of ChatGPT evaluation reports on various bechmarks.โ48Updated last year
- This project maintains a reading list for general text generation tasksโ65Updated 3 years ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)โ47Updated 10 months ago
- OMGEval๐ฎ: An Open Multilingual Generative Evaluation Benchmark for Foundation Modelsโ32Updated 7 months ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Studyโ43Updated last year
- First explanation metric (diagnostic report) for text generation evaluationโ63Updated 7 months ago
- The code implementation of the EMNLP2022 paper: DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Geneโฆโ25Updated last year
- โ52Updated 6 months ago
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).โ49Updated 6 months ago
- โ16Updated 11 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".โ39Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.โ55Updated 7 months ago
- Towards Systematic Measurement for Long Text Qualityโ31Updated 5 months ago
- [EMNLP 2023] ALCUNA: Large Language Models Meet New Knowledgeโ26Updated last year
- [ACL'24] MC^2: A Multilingual Corpus of Minority Languages in China (Tibetan, Uyghur, Kazakh, and Mongolian)โ18Updated this week
- Do Large Language Models Know What They Donโt Know?โ91Updated 3 months ago
- โ37Updated last year
- [EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-โฆโ20Updated 3 months ago
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?โ56Updated last year
- ACL'23: Unified Demonstration Retriever for In-Context Learningโ36Updated last year
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"โ34Updated 7 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"โ59Updated 3 weeks ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)โ57Updated last year
- Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communicationโ18Updated 11 months ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)โ64Updated this week
- Code for the paper Code for the paper InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuningโ99Updated last year
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Lโฆโ44Updated 7 months ago
- โ38Updated last year
- This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"โ23Updated last year
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".โ75Updated 3 months ago