Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
☆225Apr 13, 2026Updated 3 weeks ago
Alternatives and similar repositories for MiniVeo3-Reasoner
Users that are interested in MiniVeo3-Reasoner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a framework for evaluating reasoning in foundational Video Models.☆89Updated this week
- We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench show…☆60Feb 4, 2026Updated 3 months ago
- [CVPR 2026] An official implementation of "Think Visually, Reason Textually: Vision-Language Synergy in ARC"☆41Nov 26, 2025Updated 5 months ago
- 哈尔滨工业大学2023春季学期编译系统课程实验、习题、课件以及期末复习材料☆11Jul 30, 2023Updated 2 years ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆245Oct 28, 2025Updated 6 months ago
- ☆219Dec 19, 2025Updated 4 months ago
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆180Jan 26, 2026Updated 3 months ago
- ☆20Jan 26, 2026Updated 3 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆88Mar 9, 2026Updated last month
- Retargeting of whole-body human motion to humanoid robots for dexterous manipulation of articulated objects.☆30Jan 28, 2026Updated 3 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆295Mar 21, 2026Updated last month
- A list of works on video generation towards world model☆469Mar 21, 2026Updated last month
- Code release for paper "Test-Time Training Done Right"☆462Jan 5, 2026Updated 4 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation☆14Jan 2, 2026Updated 4 months ago
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)☆713Sep 24, 2025Updated 7 months ago
- 📝The official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"☆22Dec 2, 2025Updated 5 months ago
- ICML2025☆64Aug 28, 2025Updated 8 months ago
- EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Vi…☆77May 18, 2025Updated 11 months ago
- [CVPR 2025] TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing☆186May 22, 2025Updated 11 months ago
- Official Implementation of Paper: WMPO: World Model-based Policy Optimization for Vision-Language-Action Models☆201Jan 4, 2026Updated 4 months ago
- [ICCV 2025] Video-T1: Test-Time Scaling for Video Generation☆311Mar 7, 2026Updated 2 months ago
- [CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text☆53Mar 16, 2025Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆56May 8, 2025Updated 11 months ago
- ☆112Jan 8, 2025Updated last year
- A collection of awesome think with videos papers.☆98Dec 1, 2025Updated 5 months ago
- official repo for `thinking with images through-self-calling`☆26Dec 28, 2025Updated 4 months ago
- Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation☆27Dec 12, 2025Updated 4 months ago
- ☆68Feb 4, 2026Updated 3 months ago
- AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation☆42Feb 23, 2026Updated 2 months ago
- logit lens for VGGT☆27Dec 2, 2025Updated 5 months ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆59Mar 16, 2026Updated last month
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- The official code of Yume☆655Jan 14, 2026Updated 3 months ago
- Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence☆340Apr 16, 2026Updated 3 weeks ago
- This repository contains the code for the paper - "Aligning Text, Images, and 3D Structure Token-by-Token" (CVPR 2026)☆44Jun 11, 2025Updated 10 months ago
- ☆18Aug 21, 2025Updated 8 months ago
- Evaluate the Quality of Critique☆37Jun 1, 2024Updated last year
- 🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.☆174Updated this week
- ☆40Feb 4, 2026Updated 3 months ago