thuml / MiniVeo3-ReasonerLinks
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star π if you find it useful.
β170Updated 2 weeks ago
Alternatives and similar repositories for MiniVeo3-Reasoner
Users that are interested in MiniVeo3-Reasoner are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesisβ62Updated 6 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Visionβ160Updated last month
- Official Repo of From Masks to Worlds: A Hitchhikerβs Guide to World Models.β34Updated this week
- β30Updated 10 months ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learningβ47Updated 2 weeks ago
- Native Multimodal Models are World Learnersβ772Updated this week
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoTβ94Updated 2 months ago
- Official respository for ReasonGen-R1β71Updated 4 months ago
- β50Updated 10 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)β184Updated 2 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoningβ126Updated 2 months ago
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.β92Updated 2 weeks ago
- β21Updated last year
- β149Updated 9 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Attenβ¦β58Updated 3 months ago
- β53Updated 2 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β75Updated 3 months ago
- β130Updated 2 weeks ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representationsβ177Updated last month
- (NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlapsβ20Updated last week
- π₯π₯π₯ Latest Papers, Codes and Datasets on Video-LMM Post-Trainingβ142Updated this week
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)β81Updated 8 months ago
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawingβ73Updated 3 months ago
- Test-time Scaling for VAR modelsβ25Updated last month
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generationβ46Updated 2 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]β25Updated last year
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifieβ¦β293Updated 2 weeks ago
- PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiTβ145Updated last week
- A list of works on video generation towards world modelβ170Updated 2 weeks ago
- Official repository for the UAE paper, unified-GRPO, and unified-Benchβ147Updated last month