zhaowei-wang-nlp / DivSceneLinks
The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"
☆17Updated 2 months ago
Alternatives and similar repositories for DivScene
Users that are interested in DivScene are comparing it to the libraries listed below
Sorting:
- Main repo for SimWorld simulator.☆53Updated 3 weeks ago
- ☆69Updated 2 weeks ago
- A paper list for spatial reasoning☆119Updated last month
- A paper list that includes world models or generative video models for embodied agents.☆24Updated 5 months ago
- ☆75Updated 10 months ago
- ☆62Updated last week
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆60Updated 3 months ago
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆96Updated last week
- Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.☆27Updated last week
- Official Implementation of CAPEAM (ICCV'23)☆13Updated 7 months ago
- ☆21Updated last month
- ☆70Updated 7 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆156Updated 2 months ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆72Updated last month
- Evaluate Multimodal LLMs as Embodied Agents☆52Updated 5 months ago
- ☆50Updated last month
- ☆49Updated 3 weeks ago
- ☆131Updated last year
- [CVPR2024] This is the official implement of MP5☆103Updated last year
- OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆41Updated this week
- 🦾 A Dual-System VLA with System2 Thinking☆66Updated last week
- ☆49Updated last year
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆64Updated last month
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆35Updated 2 months ago
- Official Implementation of ReALFRED (ECCV'24)☆42Updated 9 months ago
- [Arxiv Paper 2504.09130]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search☆20Updated 2 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆22Updated 11 months ago
- Paper collections of the continuous effort start from World Models.☆176Updated last year
- SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆17Updated last month
- [ICLR 2025 Spotlight] Grounding Video Models to Actions through Goal Conditioned Exploration☆49Updated 2 months ago