ICML2025, I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
☆193Sep 7, 2025Updated 5 months ago
Alternatives and similar repositories for ThinkDiff
Users that are interested in ThinkDiff are comparing it to the libraries listed below
Sorting:
- ☆13Jun 4, 2025Updated 8 months ago
- Official code for ICCV2023 paper: Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis☆34Dec 27, 2023Updated 2 years ago
- [CVPR 2025] Official code for "Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation"☆66Jun 6, 2025Updated 8 months ago
- Official Repository of Personalized Visual Instruct Tuning☆34Mar 6, 2025Updated 11 months ago
- [ICLR 2026] Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos☆26Jan 26, 2026Updated last month
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆121Mar 4, 2025Updated 11 months ago
- ☆11Jul 17, 2024Updated last year
- Visualizing point clouds with transparency in Switch-NeRF (ICLR2023)☆13Mar 27, 2023Updated 2 years ago
- [NeurIPS'25] HyRF: Hybrid Radiance Fields for Efficient and High-quality Novel View Synthesis☆69Dec 17, 2025Updated 2 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆60Aug 23, 2024Updated last year
- GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset☆245Aug 15, 2025Updated 6 months ago
- Official codes for paper: 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detecti…☆161Oct 21, 2025Updated 4 months ago
- [NeurIPS 2024 Spotlight] The official implement of research paper "MotionBooth: Motion-Aware Customized Text-to-Video Generation"☆138Oct 8, 2024Updated last year
- This is the official repo of the paper "Latent Guard: a Safety Framework for Text-to-image Generation"☆52Oct 24, 2024Updated last year
- Implementation code of the paper MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing☆72Jul 13, 2025Updated 7 months ago
- [ICCV 2025] VisualCloze: A universal image generation framework that can support a wide range of in-domain tasks and generalize to unseen…☆277Jan 7, 2026Updated last month
- Official Repo for Paper "OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision" [ICLR2025]☆141Jan 27, 2025Updated last year
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning☆270Apr 15, 2025Updated 10 months ago
- Codes for GBi-Net (CVPR2022)☆130Jul 20, 2023Updated 2 years ago
- Official code for ECCV2024 paper: GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal☆104Nov 25, 2025Updated 3 months ago
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆627Oct 29, 2025Updated 4 months ago
- ☆20Mar 3, 2025Updated 11 months ago
- [AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation☆457Mar 5, 2025Updated 11 months ago
- Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs (CVPR2025 Highlight)☆124Sep 18, 2025Updated 5 months ago
- [arXiv] On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices☆132Nov 27, 2025Updated 3 months ago
- Codes for Switch-NeRF (ICLR 2023)☆211Aug 25, 2025Updated 6 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆309Sep 28, 2025Updated 5 months ago
- LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation☆38Mar 3, 2025Updated 11 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆504Sep 2, 2024Updated last year
- SEED-Voken: A Series of Powerful Visual Tokenizers☆996Nov 25, 2025Updated 3 months ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆86Jul 16, 2024Updated last year
- [ICLR 2026] Follow-Your-Shape: This repo is the official implementation of "Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-…☆59Jan 30, 2026Updated last month
- ☆27Apr 11, 2023Updated 2 years ago
- Official Implementation of "LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis"☆78Aug 25, 2025Updated 6 months ago
- The official implementation of ”RepVideo: Rethinking Cross-Layer Representation for Video Generation“☆124Jan 25, 2025Updated last year
- ☆190Dec 17, 2024Updated last year
- ☆29May 7, 2025Updated 9 months ago
- [CVPR 2026] Official pytorch implementation of "ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding"☆17Dec 17, 2025Updated 2 months ago
- CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs (CVPR2024)☆17Jun 14, 2024Updated last year