thuml / MiniVeo3-ReasonerLinks
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star π if you find it useful.
β181Updated last month
Alternatives and similar repositories for MiniVeo3-Reasoner
Users that are interested in MiniVeo3-Reasoner are comparing it to the libraries listed below
Sorting:
- Official Repo of From Masks to Worlds: A Hitchhikerβs Guide to World Models.β50Updated 3 weeks ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Visionβ167Updated last week
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesisβ62Updated 6 months ago
- β51Updated 11 months ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learningβ52Updated last month
- Official respository for ReasonGen-R1β73Updated 4 months ago
- β132Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoTβ101Updated 2 weeks ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Modelsβ115Updated last week
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β77Updated 4 months ago
- β30Updated 11 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representationsβ182Updated 2 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)β84Updated 8 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoningβ127Updated 3 months ago
- β56Updated 3 months ago
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generationβ46Updated 2 months ago
- Official PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiTβ150Updated last month
- β150Updated 10 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generationβ93Updated 8 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)β191Updated 3 months ago
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learningβ100Updated 5 months ago
- Test-time Scaling for VAR modelsβ25Updated 2 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sβ¦β182Updated last week
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]β25Updated last year
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawingβ78Updated 3 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Benchβ147Updated 2 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generationβ177Updated 6 months ago
- β51Updated 2 months ago
- β94Updated 4 months ago
- The code repository of UniRLβ46Updated 5 months ago