USC-GVL / PhysBench
[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding>
☆47Updated last month
Alternatives and similar repositories for PhysBench:
Users that are interested in PhysBench are comparing it to the libraries listed below
- [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning☆37Updated 4 months ago
- Code for paper "Grounding Video Models to Actions through Goal Conditioned Exploration".☆44Updated 3 months ago
- ☆126Updated 3 months ago
- ☆68Updated 7 months ago
- Code for FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks☆58Updated 4 months ago
- [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆37Updated 10 months ago
- Unifying 2D and 3D Vision-Language Understanding☆63Updated this week
- ☆18Updated 5 months ago
- IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos☆40Updated 2 weeks ago
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆71Updated 2 months ago
- ☆75Updated 7 months ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆100Updated 5 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆103Updated 3 weeks ago
- ☆49Updated 6 months ago
- A paper list that includes world models or generative video models for embodied agents.☆22Updated 3 months ago
- ☆25Updated 2 weeks ago
- ☆17Updated 9 months ago
- ☆23Updated last year
- Program synthesis for 3D spatial reasoning☆25Updated last month
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆56Updated this week
- Unfied World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets☆54Updated this week
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆58Updated 3 weeks ago
- [NeurIPS 2024] Official code repository for MSR3D paper☆50Updated last month
- ☆46Updated 4 months ago
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆71Updated 6 months ago
- HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction☆28Updated 3 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆68Updated last month
- ☆25Updated 3 months ago
- Evaluate Multimodal LLMs as Embodied Agents☆43Updated 2 months ago
- The PyTorch implementation of paper: "AdaWorld: Learning Adaptable World Models with Latent Actions".☆53Updated 3 weeks ago