Doodling our way to AGI βοΈ πΌοΈ π§
β122May 29, 2025Updated 9 months ago
Alternatives and similar repositories for thinking-with-generated-images
Users that are interested in thinking-with-generated-images are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videosβ25Aug 8, 2025Updated 7 months ago
- This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"β65Dec 29, 2025Updated 2 months ago
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.β28Dec 30, 2025Updated 2 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual inβ¦β1,388Mar 9, 2026Updated 2 weeks ago
- Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Imagesβ55Nov 4, 2025Updated 4 months ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β90Jul 13, 2025Updated 8 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoTβ129Jan 30, 2026Updated last month
- β17Jan 26, 2025Updated last year
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Schemeβ148Apr 9, 2025Updated 11 months ago
- [NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models acroβ¦β113Feb 10, 2026Updated last month
- [CVPR 2026] Thinking with Programming Vision: Towards a Unified View for Thinking with Imagesβ64Jan 23, 2026Updated 2 months ago
- β1,161Nov 20, 2025Updated 4 months ago
- More reliable Video Understanding Evaluationβ14Sep 23, 2025Updated 6 months ago
- β51Oct 29, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)β19Jul 1, 2025Updated 8 months ago
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"β167Jan 26, 2026Updated 2 months ago
- Official code repository of Shuffle-R1β25Feb 23, 2026Updated last month
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelangβ44Nov 19, 2025Updated 4 months ago
- β121Jul 22, 2025Updated 8 months ago
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factualityβ40Dec 1, 2025Updated 3 months ago
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusioβ¦β109Feb 4, 2026Updated last month
- EARL: Editing with Autoregression and RLβ42Nov 21, 2025Updated 4 months ago
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Modelsβ42Jan 5, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generationβ828Jun 16, 2025Updated 9 months ago
- [CVPR 2026] UnicEdit-10M and UnicBench projectβ34Mar 3, 2026Updated 3 weeks ago
- GenEval: An object-focused framework for evaluating text-to-image alignmentβ435Mar 3, 2025Updated last year
- β29Jul 2, 2024Updated last year
- SFT+RL boosts multimodal reasoningβ47Jun 27, 2025Updated 9 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Rewardβ93Aug 8, 2025Updated 7 months ago
- [ICLR26] GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learningβ105Jan 27, 2026Updated 2 months ago
- MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Researchβ24Sep 23, 2025Updated 6 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Schedulingβ42Dec 29, 2025Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- β20Apr 15, 2025Updated 11 months ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoningβ239May 30, 2025Updated 9 months ago
- β20Jun 16, 2025Updated 9 months ago
- "Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning"β60Jan 28, 2026Updated last month
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"β26Mar 9, 2024Updated 2 years ago
- β13Jan 22, 2025Updated last year
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flexβ744Mar 19, 2026Updated last week