PKU-YuanGroup / UniSandBoxView external linksLinks
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
☆60Nov 27, 2025Updated 2 months ago
Alternatives and similar repositories for UniSandBox
Users that are interested in UniSandBox are comparing it to the libraries listed below
Sorting:
- ☆21Dec 10, 2025Updated 2 months ago
- Official implementation of the paper "Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance" (WACV 2025)☆16Mar 5, 2025Updated 11 months ago
- The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"☆16Sep 2, 2024Updated last year
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆69Jan 28, 2026Updated 2 weeks ago
- [ICCV'25] ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment☆36Oct 5, 2025Updated 4 months ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…☆87Jan 26, 2026Updated 2 weeks ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆158Sep 12, 2025Updated 5 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 10 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆135Apr 9, 2025Updated 10 months ago
- [TOG 2025] Order Matters: Learning Element Ordering for Graphic Design Generation☆19Aug 5, 2025Updated 6 months ago
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆38Dec 5, 2024Updated last year
- [ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation☆116Oct 7, 2025Updated 4 months ago
- Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation☆12Dec 5, 2025Updated 2 months ago
- Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization☆20Jan 27, 2026Updated 2 weeks ago
- SAVL: Scene-Adaptive UAV Visual Localization Using Sparse Feature Extraction and Incremental Descriptor Mapping☆14Aug 6, 2025Updated 6 months ago
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆24Jul 21, 2025Updated 6 months ago
- ☆14Jul 11, 2024Updated last year
- 【Nature Computational Science 2025🔥】Deep peak property learning for efficient chiral molecules ECD spectra prediction☆51Jan 12, 2025Updated last year
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆349Jan 8, 2026Updated last month
- [TMLR 2025🔥] A survey for the autoregressive models in vision.☆787Nov 8, 2025Updated 3 months ago
- ☆14Nov 23, 2024Updated last year
- ☆14Jun 2, 2025Updated 8 months ago
- [NeurIPS 2025] This is the official repository for "RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis"☆26Nov 21, 2025Updated 2 months ago
- EmoCapCLIP: Learning Transferable Facial Emotion Representations from Large-Scale Semantically Rich Captions☆20Jul 29, 2025Updated 6 months ago
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆55Mar 21, 2025Updated 10 months ago
- Codebase for the paper HawkI: HawkI: Homography & Mutual Information Guidance for 3D-free Single Image to Aerial View☆13Jun 5, 2024Updated last year
- ☆11Jan 29, 2023Updated 3 years ago
- [NeurIPS 2025] EOC-Bench, an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocen…☆22Jun 17, 2025Updated 7 months ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 2 months ago
- Official implementation of paper "GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model", ICML 2025☆15Dec 25, 2025Updated last month
- Neural Sort implementation for "Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?" accepted @ SIGGRAPH Asia 2020☆12Dec 2, 2020Updated 5 years ago
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 9 months ago
- [ECCV 2024 Oral] The official implementation of paper: COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation☆11Aug 13, 2024Updated last year
- Envision3D: One Image to 3D with Anchor Views Interpolation☆115May 16, 2024Updated last year
- Implementation of VLM4VLA☆115Feb 2, 2026Updated last week
- Official code for VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator☆108Updated this week
- ☆132Mar 22, 2025Updated 10 months ago
- [CVPR 2024, Highlight] The official implementation of the paper "SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation…☆48Sep 30, 2025Updated 4 months ago
- [SIGGRAPH Asia 2025] Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization☆34Nov 30, 2025Updated 2 months ago