PKU-ONELab / Themis
The official repository for our EMNLP 2024 paper Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.
☆18Updated last month
Alternatives and similar repositories for Themis:
Users that are interested in Themis are comparing it to the libraries listed below
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)☆46Updated 9 months ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Updated last year
- 🩺 A collection of ChatGPT evaluation reports on various bechmarks.☆48Updated last year
- First explanation metric (diagnostic report) for text generation evaluation☆62Updated 6 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆57Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆46Updated 5 months ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆33Updated 6 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆61Updated 5 months ago
- ☆36Updated last year
- Code and data for the FACTOR paper☆44Updated last year
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated last year
- ☆52Updated 4 months ago
- ACL'23: Unified Demonstration Retriever for In-Context Learning☆34Updated last year
- The code implementation of the EMNLP2022 paper: DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Gene…☆25Updated last year
- Collection of papers for scalable automated alignment.☆82Updated 2 months ago
- Do Large Language Models Know What They Don’t Know?☆88Updated 2 months ago
- Code for "FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models (ACL 2024)"☆94Updated last month
- This project maintains a reading list for general text generation tasks☆65Updated 3 years ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆20Updated 2 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆44Updated 3 weeks ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆55Updated 6 months ago
- Towards Systematic Measurement for Long Text Quality☆31Updated 4 months ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆76Updated 11 months ago
- ☆70Updated 11 months ago
- ☆38Updated last year
- OMGEval😮: An Open Multilingual Generative Evaluation Benchmark for Foundation Models☆32Updated 6 months ago
- This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"☆23Updated last year
- ☆64Updated 11 months ago
- ☆23Updated last year
- ☆33Updated 9 months ago