zhaowei-wang-nlp / DivSceneLinks
The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"
☆17Updated last month
Alternatives and similar repositories for DivScene
Users that are interested in DivScene are comparing it to the libraries listed below
Sorting:
- ☆39Updated this week
- SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆15Updated last week
- ☆69Updated 6 months ago
- Official Implementation of CAPEAM (ICCV'23)☆13Updated 6 months ago
- A paper list for spatial reasoning☆82Updated this week
- Evaluate Multimodal LLMs as Embodied Agents☆49Updated 3 months ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆44Updated 2 weeks ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆22Updated 9 months ago
- Official implementation of: Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel☆25Updated 5 months ago
- A paper list that includes world models or generative video models for embodied agents.☆23Updated 4 months ago
- [arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆26Updated this week
- Responsible Robotic Manipulation☆11Updated last week
- ☆72Updated 9 months ago
- ☆41Updated this week
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆30Updated last month
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆59Updated 2 months ago
- Official Implementation of ReALFRED (ECCV'24)☆40Updated 7 months ago
- Official code for the paper: Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld☆57Updated 8 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆39Updated 4 months ago
- ☆17Updated 11 months ago
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆65Updated 3 weeks ago
- ☆20Updated 3 years ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆64Updated this week
- ☆48Updated last year
- Implementation of our ICCV 2023 paper DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation☆19Updated last year
- ☆49Updated 8 months ago
- ☆16Updated last week
- ☆47Updated last week
- EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments☆15Updated 3 weeks ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆85Updated 9 months ago