Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
☆214Oct 12, 2025Updated 4 months ago
Alternatives and similar repositories for MiniVeo3-Reasoner
Users that are interested in MiniVeo3-Reasoner are comparing it to the libraries listed below
Sorting:
- We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench show…☆53Feb 4, 2026Updated last month
- ☆214Dec 19, 2025Updated 2 months ago
- This is a framework for evaluating reasoning in foundational Video Models.☆74Feb 24, 2026Updated last week
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆225Oct 28, 2025Updated 4 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆270Feb 21, 2026Updated last week
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆135Jun 10, 2025Updated 8 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆78Feb 13, 2026Updated 3 weeks ago
- A collection of awesome think with videos papers.☆90Dec 1, 2025Updated 3 months ago
- 哈尔滨工业大学2023春季学期编译系统课 程实验、习题、课件以及期末复习材料☆11Jul 30, 2023Updated 2 years ago
- [CVPR 2025] TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing☆180May 22, 2025Updated 9 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆476Jan 17, 2025Updated last year
- AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation☆31Feb 23, 2026Updated last week
- [CVPR 2026] An official implementation of "Think Visually, Reason Textually: Vision-Language Synergy in ARC"☆37Nov 26, 2025Updated 3 months ago
- This repository contains the code for the paper - "Aligning Text, Images, and 3D Structure Token-by-Token" (CVPR 2026)☆44Jun 11, 2025Updated 8 months ago
- Retargeting of whole-body human motion to humanoid robots for dexterous manipulation of articulated objects.☆25Jan 28, 2026Updated last month
- Official Implementation of Paper: WMPO: World Model-based Policy Optimization for Vision-Language-Action Models☆174Jan 4, 2026Updated 2 months ago
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)☆693Sep 24, 2025Updated 5 months ago
- Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence☆279Updated this week
- Code repository for "ZeroShape: Regression-based Zero-shot Shape Reconstruction".☆137Jul 18, 2024Updated last year
- A list of works on video generation towards world model☆355Feb 11, 2026Updated 3 weeks ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆53May 8, 2025Updated 9 months ago
- [CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text☆53Mar 16, 2025Updated 11 months ago
- official repo for `thinking with images through-self-calling`☆21Dec 28, 2025Updated 2 months ago
- [NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference☆36Oct 29, 2025Updated 4 months ago
- Reinforcing Text-Rich Video Reasoning with Visual Rumination☆27Nov 24, 2025Updated 3 months ago
- A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.☆149Feb 26, 2026Updated last week
- ☆39Oct 29, 2025Updated 4 months ago
- logit lens for VGGT☆26Dec 2, 2025Updated 3 months ago
- ICML2025☆63Aug 28, 2025Updated 6 months ago
- Code release for paper "Test-Time Training Done Right"☆385Jan 5, 2026Updated 2 months ago
- Minute-long video generation at 24FPS.☆50Feb 2, 2026Updated last month
- [CVPR 2026] Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration☆35Feb 25, 2026Updated last week
- A set of grasshopper components to quadrangulate tri-meshes using a graph matching approach.☆10May 26, 2023Updated 2 years ago
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation☆14Jan 2, 2026Updated 2 months ago
- Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation☆27Dec 12, 2025Updated 2 months ago
- [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video☆220May 25, 2025Updated 9 months ago
- Official code for NeurIPS 2024 paper LRM-Zero: Training Large Reconstruction Models with Synthesized Data☆153Oct 7, 2024Updated last year
- [ICCV 2025] Video-T1: Test-Time Scaling for Video Generation☆306Jun 29, 2025Updated 8 months ago
- [NeurIPS 2025] The official code for "IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation"☆22Jun 5, 2025Updated 9 months ago