sinwang20 / D2POLinks
[ACL 2025] "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning." https://arxiv.org/abs/2503.10480
☆16Updated 6 months ago
Alternatives and similar repositories for D2PO
Users that are interested in D2PO are comparing it to the libraries listed below
Sorting:
- Official code of "RoboOmni: Proactive Robot Manipulation in Omni-modal Context"☆80Updated 2 months ago
- The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆140Updated 3 weeks ago
- Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation☆28Updated last month
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆179Updated 7 months ago
- [CVPR' 25] Official repo for From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Cal…☆21Updated 7 months ago
- ☆110Updated last year
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆236Updated 3 weeks ago
- Official Repository of LatentSeek☆76Updated 7 months ago
- ☆116Updated 6 months ago
- [CVPR2024] This is the official implement of MP5☆106Updated last year
- A Self-Training Framework for Vision-Language Reasoning☆88Updated last year
- [AAAI 2026] Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks☆39Updated 2 months ago
- ☆88Updated last year
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆68Updated 8 months ago
- ☆24Updated 8 months ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆77Updated 11 months ago
- [NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆53Updated 4 months ago
- ☆34Updated 5 months ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆95Updated 8 months ago
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆38Updated 3 months ago
- ☆21Updated 6 months ago
- [NeurIPS 2025] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains☆71Updated 6 months ago
- We introduce BabyVision, a benchmark revealing the infancy of AI vision.☆162Updated 2 weeks ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆104Updated 4 months ago
- MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)☆84Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆84Updated 3 months ago
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.☆87Updated 11 months ago
- Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.☆36Updated 6 months ago
- A RLHF Infrastructure for Vision-Language Models☆193Updated last year
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆59Updated last year