Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
☆60Nov 27, 2025Updated 3 months ago
Alternatives and similar repositories for UniSandBox
Users that are interested in UniSandBox are comparing it to the libraries listed below
Sorting:
- Official implementation of the paper "Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance" (WACV 2025)☆16Mar 5, 2025Updated last year
- ☆21Feb 13, 2026Updated 3 weeks ago
- The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"☆16Sep 2, 2024Updated last year
- [ICCV'25] ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment☆36Oct 5, 2025Updated 5 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆78Feb 13, 2026Updated 3 weeks ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…☆87Jan 26, 2026Updated last month
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆158Sep 12, 2025Updated 5 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 11 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆135Apr 9, 2025Updated 11 months ago
- [TOG 2025] Order Matters: Learning Element Ordering for Graphic Design Generation☆20Aug 5, 2025Updated 7 months ago
- 南京大学智能科学与技术学院开源知识库与课程攻略平台 | Open knowledge base and course survival guides for NJU School of intelligence science and technology students…☆30Updated this week
- [ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation☆115Oct 7, 2025Updated 5 months ago
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆38Dec 5, 2024Updated last year
- (CVPR 2026) Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation☆27Feb 28, 2026Updated last week
- Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation☆12Dec 5, 2025Updated 3 months ago
- 【Nature Computational Science 2025🔥】Deep peak property learning for efficient chiral molecules ECD spectra prediction☆51Jan 12, 2025Updated last year
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆351Jan 8, 2026Updated 2 months ago
- [NeurIPS 2025 D&B🔥] ImgEdit: A Unified Image Editing Dataset and Benchmark☆283Nov 5, 2025Updated 4 months ago
- [TMLR 2025🔥] A survey for the autoregressive models in vision.☆788Nov 8, 2025Updated 4 months ago
- [CVPR 2025] GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector☆16Mar 19, 2025Updated 11 months ago
- ☆11Jan 29, 2023Updated 3 years ago
- [NeurIPS 2025] This is the official repository for "RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis"☆26Nov 21, 2025Updated 3 months ago
- Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization☆21Jan 27, 2026Updated last month
- Official implementation of paper "GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model", ICML 2025☆15Dec 25, 2025Updated 2 months ago
- The official code of "Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling"☆47Feb 26, 2026Updated last week
- Neural Sort implementation for "Pixelor: A Competitive Sketching AI Agent. So you think you can sketch?" accepted @ SIGGRAPH Asia 2020☆12Dec 2, 2020Updated 5 years ago
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- [ICLR'25] Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?☆12Apr 11, 2025Updated 10 months ago
- ☆14Nov 23, 2024Updated last year
- EmoCapCLIP: Learning Transferable Facial Emotion Representations from Large-Scale Semantically Rich Captions☆20Jul 29, 2025Updated 7 months ago
- [NeurIPS 2025] EOC-Bench, an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocen…☆22Jun 17, 2025Updated 8 months ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 2 months ago
- Data for evaluating GPT-4V☆11Oct 26, 2023Updated 2 years ago
- ☆16Sep 4, 2025Updated 6 months ago
- Envision3D: One Image to 3D with Anchor Views Interpolation☆114May 16, 2024Updated last year
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆65Jan 1, 2026Updated 2 months ago
- ☆132Mar 22, 2025Updated 11 months ago
- [CVPR 2024, Highlight] The official implementation of the paper "SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation…☆48Sep 30, 2025Updated 5 months ago