zhchen18/ToMBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhchen18/ToMBench)

zhchen18 / ToMBench

ToMBench: Benchmarking Theory of Mind in Large Language Models, ACL 2024.

☆68

Alternatives and similar repositories for ToMBench

Users that are interested in ToMBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ying-hui-he / Hi-ToM_dataset
View on GitHub
☆21Oct 11, 2025Updated 9 months ago
cicl-stanford / procedural-evals-tom
View on GitHub
☆40Jul 16, 2023Updated 3 years ago
salavi / Clever_Hans_or_N-ToM
View on GitHub
☆12May 6, 2024Updated 2 years ago
skywalker023 / fantom
View on GitHub
👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"
☆62May 31, 2024Updated 2 years ago
Mars-tin / awesome-theory-of-mind
View on GitHub
Machine Theory of Mind Reading List. Built upon EMNLP Findings 2023 Paper: Towards A Holistic Landscape of Situated Theory of Mind in Lar…
☆154Jun 11, 2026Updated last month
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
CUHK-ARISE / LLMPersonality
View on GitHub
Code and data for the paper: On the Reliability of Psychological Scales on Large Language Models
☆30Dec 15, 2025Updated 7 months ago
Sahandfer / EmoBench
View on GitHub
[ACL24] EmoBench: Evaluating the Emotional Intelligence of Large Language Models
☆117May 16, 2025Updated last year
clbaker / BToM
View on GitHub
☆40Mar 20, 2017Updated 9 years ago
sileod / llm-theory-of-mind
View on GitHub
Testing Theory of Mind (ToM) in language models with epistemic logic
☆22Jul 3, 2026Updated 2 weeks ago
shawnsihyunlee / simulatedtom
View on GitHub
Public repository for "Think Twice: Perspective-Taking Improves Large Language Models’ Theory-of-Mind Capabilities".
☆25Aug 16, 2023Updated 2 years ago
SCAI-JHU / MuMA-ToM
View on GitHub
[AAAI 2025 𝐎𝐫𝐚𝐥] MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
☆41Jun 28, 2026Updated 3 weeks ago
franticnerd / triovecevent
View on GitHub
☆13Aug 23, 2017Updated 8 years ago
conceptmath / conceptmath
View on GitHub
[ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large …
☆26May 29, 2024Updated 2 years ago
kayburns / tom-qa-dataset
View on GitHub
☆24Oct 31, 2018Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
deeplearning-wisc / picle
View on GitHub
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
☆28Jun 27, 2024Updated 2 years ago
amy-deng / cape
View on GitHub
☆12Oct 24, 2022Updated 3 years ago
nku-zhichengzhang / MART
View on GitHub
[CVPR 2024] This is the official implementation of "MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Disti…
☆22Jun 14, 2025Updated last year
stacyste / TheoryOfMindInferenceModels
View on GitHub
☆28Nov 22, 2019Updated 6 years ago
sotopia-lab / sotopia
View on GitHub
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
☆317Jun 5, 2026Updated last month
uyaseen / bionlp-ost-2019
View on GitHub
MIC-CIS entry in PharmaCoNER, Bacteria Biotope (BB 2029) & SeeDev 2019 Shared Tasks in EMNLP '19
☆11Feb 22, 2020Updated 6 years ago
Lab-ANT / Time2State
View on GitHub
An unsupervised framework for inferring the latent states in time series data
☆20Mar 18, 2024Updated 2 years ago
facebookresearch / dualformer
View on GitHub
implementation of dualformer
☆25Mar 1, 2025Updated last year
SCAI-JHU / AutoToM
View on GitHub
[NeurIPS 2025 𝐒𝐩𝐨𝐭𝐥𝐢𝐠𝐡𝐭] AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling
☆45Jun 28, 2026Updated 3 weeks ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
allenai / faithful-nmn
View on GitHub
Evaluating and improving the faithfulness of the interpretations offered by Neural Module Networks
☆13Jun 12, 2023Updated 3 years ago
yning / nested_Multi_Instance_Learning
View on GitHub
☆18Sep 8, 2017Updated 8 years ago
sotopia-lab / sotopia-pi
View on GitHub
Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)
☆85May 7, 2024Updated 2 years ago
sotopia-lab / awesome-social-agents
View on GitHub
A collection of works that investigate social agents, simulations and their real-world impact in text, embodied, and robotics contexts.
☆113Jun 14, 2026Updated last month
CriticBench / CriticBench
View on GitHub
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆31Mar 5, 2024Updated 2 years ago
AIM3-RUC / MPMQA
View on GitHub
Official repository of the paper MPMQA: Multimodal Question Answering on Product Manuals (AAAI 2023)
☆21Nov 28, 2022Updated 3 years ago
julianje / Bishop
View on GitHub
Mental state inference from observable behavior
☆15Dec 3, 2021Updated 4 years ago
neulab / ToM-Language-Acquisition
View on GitHub
Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".
☆15Apr 27, 2023Updated 3 years ago
h-jia / batch_normalized_LSTM
View on GitHub
Implementation of batch normalization LSTM in pytorch.
☆12Jul 10, 2019Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ZJU-REAL / VerifyBench
View on GitHub
[ICLR 2026] VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
☆21Feb 18, 2026Updated 5 months ago
HITSZ-HLT / ST-w-Scorer-ABSA
View on GitHub
Released code for「Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction」in ACL2024.
☆24Feb 21, 2025Updated last year
Sea94 / EQT
View on GitHub
☆19Nov 15, 2023Updated 2 years ago
zwq2018 / Agent-Pro
View on GitHub
The Code Repo for Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
☆129Sep 2, 2024Updated last year
hailiang-wang / SmartQA-System
View on GitHub
医疗智能问答系统
☆16Mar 30, 2017Updated 9 years ago
pranscript / ETH-NFT-Twitter-sales-bot
View on GitHub
Twitter-NFT sales bot that tweets individual and sweep sales with images from Opensea, Looksrare, X2Y2, and Blur using Opensea/Looksrare …
☆13Jul 27, 2023Updated 2 years ago
google-deepmind / exedec
View on GitHub
☆14May 9, 2024Updated 2 years ago