EgoToM is an egocentric theory-of-mind benchmark built on Ego4D videos, containing multi-choice questions that evaluate multimodal large language models' ability to infer a camera wearer's goals, in-the-moment belief states, and future actions.
☆13Apr 1, 2025Updated 11 months ago
Alternatives and similar repositories for EgoToM
Users that are interested in EgoToM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official PyTorch codebase for the Modeling Caption Diversity in ContrastiveVision-Language Pretraining paper.☆18Mar 28, 2025Updated 11 months ago
- Dataset and evaluation benchmark for Privacy Leakage Evaluation of Autonomous Web Agents☆36Updated this week
- [ICLR 26] Official Implementation of MaskInversion☆31Feb 28, 2026Updated 3 weeks ago
- [ICCV 2023] This is for the paper "Deep Homography Mixture for Single Image Rolling Shutter Correction".☆13May 25, 2025Updated 9 months ago
- Sample projects to showcase the Unity Meta XR Interaction SDK.☆42Feb 28, 2026Updated 3 weeks ago
- [CVPR 2025] Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors☆16Jun 6, 2025Updated 9 months ago
- Official Repository for VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning☆25Apr 12, 2024Updated last year
- [NeurIPS 2023] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models☆22Oct 21, 2025Updated 5 months ago
- ☆24Nov 20, 2025Updated 4 months ago
- BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated c…☆40Apr 15, 2025Updated 11 months ago
- A reconstruction framework for materializing subjective experiences from brain signals☆14Jan 18, 2025Updated last year
- ☆24Jan 12, 2026Updated 2 months ago
- Multi-modality Hierarchical Recall based on GBDTs for Bipolar Disorder Classification☆10Jul 12, 2023Updated 2 years ago
- AAAI 2024-Controllable Mind Visual Diffusion Model☆16Dec 18, 2023Updated 2 years ago
- [ICCV 2025] Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction☆26Oct 27, 2025Updated 4 months ago
- This repository is the official implementation of TimeHC-RL (Distilabel (Data Generation) + TRL (SFT) + VeRL (GRPO)).☆48Jun 4, 2025Updated 9 months ago
- [ECCV2024] Nonverbal Interaction Detection☆29Oct 30, 2024Updated last year
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆62Jun 6, 2025Updated 9 months ago
- code for the paper Imitation Learning from Observation with Automatic Discount Scheduling☆13Mar 27, 2024Updated last year
- Unifying 2D and 3D Vision-Language Understanding☆120Jul 23, 2025Updated 8 months ago
- Toolkit for TRoVE, for generating synthetic dataset from real-world annotations and scenes. Accepted at #ECCV2022☆12Jul 20, 2022Updated 3 years ago
- ☆13Apr 23, 2025Updated 11 months ago
- Computed Appraisals Model. Code and data for the 2023 paper, "Emotion prediction as computation over a generative theory of mind"☆13Jun 12, 2023Updated 2 years ago
- DARMA: Software for Dual Axis Rating and Media Annotation☆12Nov 28, 2022Updated 3 years ago
- it provides Pepper Robot conversation abilities to handle a free open-domain dialogue.☆26Feb 5, 2024Updated 2 years ago
- This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…☆13May 25, 2023Updated 2 years ago
- This code is submitted to ICCV Workshop 2017: Fake vs. true facial emotion recognition competition☆11Oct 17, 2017Updated 8 years ago
- #2019 Micro-expression Grand Challeng☆12Dec 23, 2019Updated 6 years ago
- ☆10Jun 12, 2023Updated 2 years ago
- This code submission for the ICCV 17 Real Versus Fake Expressed Emotion Challenge provides source code to extract the features and classi…☆11Aug 28, 2017Updated 8 years ago
- [ICCV 2025] Object-centric Video Question Answering with Visual Grounding and Referring☆25Aug 8, 2025Updated 7 months ago
- ☆32Sep 19, 2025Updated 6 months ago
- A CLI version for viewing OTP Crash Dumps☆68Mar 13, 2026Updated last week
- [ICLR 2026] FOCUS: Efficient Keyframe Selection for Long Video Understanding☆56Feb 3, 2026Updated last month
- Data release for Step Differences in Instructional Video (CVPR24)☆14Jun 19, 2024Updated last year
- Official code base for "Long-Tailed Diffusion Models With Oriented Calibration" ICLR2024☆16Jul 11, 2024Updated last year
- ☆80Jan 22, 2026Updated 2 months ago
- Annotated Tutorial for PerAct☆19Sep 11, 2023Updated 2 years ago
- ☆40Jun 6, 2025Updated 9 months ago