psunlpgroup / ReaLMistakeLinks
This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".
☆29Updated 10 months ago
Alternatives and similar repositories for ReaLMistake
Users that are interested in ReaLMistake are comparing it to the libraries listed below
Sorting:
- Evaluate the Quality of Critique☆35Updated last year
- ☆14Updated last year
- ☆22Updated 6 months ago
- ☆44Updated 9 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆30Updated 5 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆43Updated last year
- Tasks for describing differences between text distributions.☆16Updated 10 months ago
- Evaluation on Logical Reasoning and Abstract Reasoning Challenges☆27Updated 2 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆32Updated last year
- Code for "RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆22Updated 3 months ago
- Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"☆27Updated 3 weeks ago
- ☆41Updated last year
- ☆28Updated last year
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆22Updated 2 months ago
- Supporting code for ReCEval paper☆28Updated 9 months ago
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆24Updated 4 months ago
- ☆43Updated 2 months ago
- [arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆33Updated 6 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆50Updated 2 weeks ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆40Updated 2 years ago
- Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks☆12Updated last year
- Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication☆20Updated last year
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"☆27Updated 2 years ago
- ☆15Updated 2 months ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆35Updated last year
- Aioli: A unified optimization framework for language model data mixing☆27Updated 5 months ago