XinrunXu / DeepPHYLinks
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning
☆30Updated this week
Alternatives and similar repositories for DeepPHY
Users that are interested in DeepPHY are comparing it to the libraries listed below
Sorting:
- ☆48Updated 3 months ago
- ☆109Updated 4 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning", https://arxiv.org/abs/2505.13934☆74Updated 2 months ago
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simula…☆91Updated last month
- ☆60Updated 5 months ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated 3 weeks ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆35Updated last year
- AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆76Updated 2 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆66Updated last year
- Bayes-Adaptive RL for LLM Reasoning☆36Updated 2 months ago
- ☆131Updated last year
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆59Updated 6 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆70Updated last year
- Official code for the paper: WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents☆38Updated 2 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆42Updated 3 months ago
- Language Repository for Long Video Understanding☆32Updated last year
- Natural Language Reinforcement Learning☆92Updated last week
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆43Updated 3 weeks ago
- ☆51Updated last month
- ☆21Updated 9 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆104Updated 2 months ago
- [ICLR2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models☆14Updated 5 months ago
- Code for "Interactive Task Planning with Language Models"☆31Updated 3 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR 2025)☆42Updated 3 months ago
- [Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.☆59Updated this week
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆30Updated 3 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆26Updated 3 months ago
- ☆29Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆140Updated 8 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆78Updated last month