thuml / MiniVeo3-ReasonerLinks
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star π if you find it useful.
β186Updated last month
Alternatives and similar repositories for MiniVeo3-Reasoner
Users that are interested in MiniVeo3-Reasoner are comparing it to the libraries listed below
Sorting:
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Visionβ177Updated 2 weeks ago
- Official Repo of From Masks to Worlds: A Hitchhikerβs Guide to World Models.β58Updated last month
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesisβ62Updated 7 months ago
- β30Updated last year
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoTβ104Updated last month
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learningβ53Updated last month
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)β205Updated 4 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representationsβ188Updated 2 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sβ¦β216Updated 2 weeks ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoningβ131Updated 3 months ago
- β51Updated 11 months ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Modelsβ126Updated last month
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β80Updated 4 months ago
- β135Updated last month
- Official respository for ReasonGen-R1β73Updated 5 months ago
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.β95Updated last month
- β55Updated 3 months ago
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learningβ100Updated 6 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifieβ¦β316Updated last month
- (NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlapsβ23Updated 3 weeks ago
- Official repository for the UAE paper, unified-GRPO, and unified-Benchβ150Updated 2 months ago
- β154Updated 11 months ago
- The code repository of UniRLβ46Updated 6 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generationβ94Updated 9 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)β85Updated 9 months ago
- Test-time Scaling for VAR modelsβ25Updated 2 months ago
- π» Uniform Discrete Diffusion with Metric Path for Video Generationβ78Updated 3 weeks ago
- ICML2025β61Updated 3 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generationβ178Updated 6 months ago
- β51Updated 3 months ago