cambridgeltl / topviewrsLinks
TopViewRS: Vision-Language Models as Top-View Spatial Reasoners (EMNLP 2024 Oral)
☆15Updated 5 months ago
Alternatives and similar repositories for topviewrs
Users that are interested in topviewrs are comparing it to the libraries listed below
Sorting:
- ☆19Updated 9 months ago
- Evaluate Multimodal LLMs as Embodied Agents☆54Updated 9 months ago
- Official Code for "Learning to Reason via Mixture-of-Thought for Logical Reasoning"☆25Updated 3 weeks ago
- ☆52Updated 7 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆105Updated last year
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆22Updated last month
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆33Updated last month
- Official Repo of LangSuitE☆84Updated last year
- LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents (ICLR 2024)☆82Updated 6 months ago
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning☆163Updated 3 weeks ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)☆46Updated 7 months ago
- ☆32Updated last year
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆33Updated 2 weeks ago
- ☆54Updated last year
- ☆133Updated last year
- ☆44Updated 6 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆62Updated 7 months ago
- [ICML 2024] Language Models Represent Beliefs of Self and Others☆35Updated last year
- Code for "Interactive Task Planning with Language Models"☆32Updated 7 months ago
- Bayes-Adaptive RL for LLM Reasoning☆41Updated 6 months ago
- Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training☆45Updated 3 months ago
- ☆21Updated last year
- Scaffold Prompting to promote LMMs☆45Updated 11 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆44Updated 3 months ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆46Updated 6 months ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆13Updated 8 months ago
- ☆17Updated 11 months ago
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆78Updated 6 months ago
- The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"☆19Updated 7 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆47Updated last year