[ACL 2025] "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning." https://arxiv.org/abs/2503.10480
☆17Jul 22, 2025Updated 8 months ago
Alternatives and similar repositories for D2PO
Users that are interested in D2PO are comparing it to the libraries listed below
Sorting:
- ☆21Jul 22, 2025Updated 8 months ago
- Bringing some SQL to Qdrant☆15Jun 17, 2025Updated 9 months ago
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆40Feb 20, 2025Updated last year
- ☆12Jul 4, 2024Updated last year
- An open-source personal academic homepage template characterized by its user-friendly design and extensive scalability.☆36Oct 6, 2025Updated 5 months ago
- ☆11Dec 6, 2024Updated last year
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- [NeurIPS 2025 Spotlight] Official repository for "Web-Shepherd: Advancing PRMs for Reinforcing Web Agents"☆53May 21, 2025Updated 10 months ago
- ☆14Dec 25, 2024Updated last year
- ☆17May 29, 2022Updated 3 years ago
- ☆21Jan 15, 2026Updated 2 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆283Feb 21, 2026Updated last month
- Repo for Llatrieval☆31Aug 21, 2024Updated last year
- [ICCV 2025] AdsQA: Towards Advertisement Video Understanding Arxiv: https://arxiv.org/abs/2509.08621☆34Oct 30, 2025Updated 4 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- Responsible Robotic Manipulation☆16Aug 31, 2025Updated 6 months ago
- MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, i…☆166Mar 6, 2026Updated 2 weeks ago
- Collection of papers about video-audio understanding☆24Dec 26, 2025Updated 2 months ago
- The frontend of ZVMS 4, powered by Element-plus, Vite, and Vue.☆10Feb 11, 2026Updated last month
- ☆22May 3, 2025Updated 10 months ago
- ☆22Jan 26, 2024Updated 2 years ago
- 一个复旦幻灯片的 Typst 主题。An unofficial Fudan slide theme for Typst.☆16Mar 19, 2024Updated 2 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- PICABench: How Far Are We from Physically Realistic Image Editing?☆36Nov 5, 2025Updated 4 months ago
- Gradient-based Next-best-view Planning☆17Nov 20, 2024Updated last year
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆27Oct 13, 2024Updated last year
- Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation☆27Dec 12, 2025Updated 3 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆82Jun 17, 2024Updated last year
- ☆41Dec 9, 2025Updated 3 months ago
- Source codes and data for our IJCAI 2021 paper "Consistent Inference for Dialogue Relation Extraction".☆24Nov 27, 2021Updated 4 years ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆55Nov 29, 2024Updated last year
- Official implementation of "ScoreNet: Learning Non-Uniform Attention and Augmentation for Transformer-Based Histopathological Image Class…☆12Mar 6, 2023Updated 3 years ago
- [NAACL 2025] SIUO: Cross-Modality Safety Alignment☆124Jan 31, 2025Updated last year
- ☆19Dec 23, 2024Updated last year
- [ICCV 2025] CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games☆51Nov 19, 2025Updated 4 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆32Jun 23, 2025Updated 8 months ago
- Codes for paper "SafeAgentBench: A Benchmark for Safe Task Planning of \\ Embodied LLM Agents"☆65Feb 25, 2025Updated last year
- Co-Attention Aligned Mutual Cross-Attention for Cloth-Changing Person Re-Identification [ACCV 2022 Oral]☆17Dec 26, 2024Updated last year
- The code repository for "Co-Transport for Class-Incremental Learning" (ACM MM'21) in PyTorch.☆13Dec 24, 2021Updated 4 years ago