yyyybq / Awesome-Spatial-Reasoning
A paper list for spatial reasoning
☆58Updated last month
Alternatives and similar repositories for Awesome-Spatial-Reasoning:
Users that are interested in Awesome-Spatial-Reasoning are comparing it to the libraries listed below
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆112Updated last week
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆84Updated 8 months ago
- Spatial-R1: The first MLLM trained using GRPO for spatial reasoning in videos☆31Updated this week
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆55Updated last month
- Accepted by CVPR 2024☆33Updated 11 months ago
- [World-Model-Survey-2024] Paper list and projects for World Model☆9Updated 6 months ago
- ☆82Updated last month
- Collections of Papers and Projects for Multimodal Reasoning.☆104Updated 2 weeks ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆21Updated 3 months ago
- ☆24Updated 2 months ago
- ☆15Updated last week
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆58Updated last month
- ☆43Updated last month
- ☆116Updated 2 months ago
- Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆123Updated 2 weeks ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆48Updated last month
- RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints☆43Updated 3 weeks ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆181Updated last week
- ☆69Updated 5 months ago
- A python script for downloading huggingface datasets and models.☆19Updated 3 weeks ago
- ☆9Updated last month
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆28Updated 5 months ago
- The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"☆51Updated last month
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆36Updated 3 weeks ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆95Updated 2 weeks ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆81Updated last month
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆56Updated last month
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆36Updated last month
- ☆12Updated last month
- Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape☆15Updated 2 weeks ago