psunlpgroup / ReaLMistakeLinks
This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".
☆30Updated last year
Alternatives and similar repositories for ReaLMistake
Users that are interested in ReaLMistake are comparing it to the libraries listed below
Sorting:
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Evaluate the Quality of Critique☆36Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆82Updated last year
- ☆53Updated last year
- This repository contains data, code and models for contextual noncompliance.☆24Updated last year
- ☆102Updated 11 months ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆71Updated last year
- ☆103Updated last year
- ☆74Updated last year
- Contrastive Chain-of-Thought Prompting☆68Updated last year
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…☆70Updated last year
- ☆22Updated 10 months ago
- ☆48Updated 6 months ago
- Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"