Doodling our way to AGI βοΈ πΌοΈ π§
β126May 29, 2025Updated last year
Alternatives and similar repositories for thinking-with-generated-images
Users that are interested in thinking-with-generated-images are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videosβ27Aug 8, 2025Updated 9 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual inβ¦β1,466Mar 9, 2026Updated 2 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β95Jul 13, 2025Updated 10 months ago
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.β37Dec 30, 2025Updated 5 months ago
- β19Jan 26, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Schemeβ149Apr 9, 2025Updated last year
- [NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models acroβ¦β119Feb 10, 2026Updated 3 months ago
- β1,215Nov 20, 2025Updated 6 months ago
- More reliable Video Understanding Evaluationβ15Sep 23, 2025Updated 8 months ago
- β51Oct 29, 2023Updated 2 years ago
- β48May 16, 2026Updated 2 weeks ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)β19Jul 1, 2025Updated 10 months ago
- Official code repository of Shuffle-R1β26Feb 23, 2026Updated 3 months ago
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelangβ44Nov 19, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"β185May 1, 2026Updated 3 weeks ago
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factualityβ45May 19, 2026Updated last week
- β127Jul 22, 2025Updated 10 months ago
- EARL: Editing with Autoregression and RLβ42Nov 21, 2025Updated 6 months ago
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusioβ¦β114Apr 13, 2026Updated last month
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Modelsβ41Jan 5, 2026Updated 4 months ago
- [Extended verision ICLR 2025 Blog Track] Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generatioβ¦β840Jun 16, 2025Updated 11 months ago
- A hand-made OS core for National College Students Computer System Ability Competitionβ11Aug 28, 2021Updated 4 years ago
- β30Jul 2, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- SFT+RL boosts multimodal reasoningβ49Jun 27, 2025Updated 11 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Rewardβ94Aug 8, 2025Updated 9 months ago
- [NeurIPS 2025 D&B Track] MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Researchβ29May 8, 2026Updated 3 weeks ago
- [ICLR26] GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learningβ106Jan 27, 2026Updated 4 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Schedulingβ42Dec 29, 2025Updated 5 months ago
- [CVPR 2026] UnicEdit-10M and UnicBench projectβ41Mar 3, 2026Updated 2 months ago
- β22Apr 15, 2025Updated last year
- [ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detectionβ13Apr 12, 2024Updated 2 years ago
- [CVPR 2026] Official implementation of FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-and-Language Navigationβ31Feb 23, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoningβ236May 30, 2025Updated last year
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β463Aug 8, 2025Updated 9 months ago
- [ACL 2026 Findings] "Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning"β62Updated this week
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"β25Mar 9, 2024Updated 2 years ago
- β19Jan 26, 2025Updated last year
- β14Jan 22, 2025Updated last year
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flexβ777Mar 19, 2026Updated 2 months ago