nttmdlab-nlp / ToMATOLinks
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind (AAAI2025)
☆16Updated 5 months ago
Alternatives and similar repositories for ToMATO
Users that are interested in ToMATO are comparing it to the libraries listed below
Sorting:
- List of papers on Self-Correction of LLMs.☆78Updated 9 months ago
- Aligned, Review-Informed Edits of Scientific Papers☆54Updated 2 years ago
- Code repository for the paper "Mission: Impossible Language Models."☆54Updated 2 weeks ago
- ☆57Updated 10 months ago
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆12Updated last year
- ☆52Updated last year
- ☆21Updated 5 months ago
- [ICML 2024] Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations☆14Updated last year
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆41Updated last year
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆78Updated last year
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆30Updated last year
- Evaluate the Quality of Critique☆36Updated last year
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated 11 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆82Updated last year
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning☆49Updated 11 months ago
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆110Updated 10 months ago
- ☆25Updated last year
- Code/data for MARG (multi-agent review generation)☆51Updated last week
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆50Updated 2 months ago
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆55Updated last year
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆32Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Self-Supervised Alignment with Mutual Information☆21Updated last year
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆105Updated 7 months ago
- ☆27Updated 7 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆43Updated last week
- Byte-sized text games for code generation tasks on virtual environments☆20Updated last year
- ☆29Updated last year
- ☆74Updated last year
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆29Updated last year