Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
☆221Oct 12, 2025Updated 6 months ago
Alternatives and similar repositories for MiniVeo3-Reasoner
Users that are interested in MiniVeo3-Reasoner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a framework for evaluating reasoning in foundational Video Models.☆88Apr 1, 2026Updated 2 weeks ago
- We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench show…☆58Feb 4, 2026Updated 2 months ago
- [CVPR 2026] An official implementation of "Think Visually, Reason Textually: Vision-Language Synergy in ARC"☆40Nov 26, 2025Updated 4 months ago
- 哈尔滨工业大学2023春季学期编译系统课程实验、习题、课件以及期末复习材料☆11Jul 30, 2023Updated 2 years ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆238Oct 28, 2025Updated 5 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated 10 months ago
- ☆218Dec 19, 2025Updated 3 months ago
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆175Jan 26, 2026Updated 2 months ago
- ☆20Jan 26, 2026Updated 2 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆87Mar 9, 2026Updated last month
- Retargeting of whole-body human motion to humanoid robots for dexterous manipulation of articulated objects.☆28Jan 28, 2026Updated 2 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆291Mar 21, 2026Updated 3 weeks ago
- A list of works on video generation towards world model☆454Mar 21, 2026Updated 3 weeks ago
- Code release for paper "Test-Time Training Done Right"☆433Jan 5, 2026Updated 3 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆478Jan 17, 2025Updated last year
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation☆14Jan 2, 2026Updated 3 months ago
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)☆709Sep 24, 2025Updated 6 months ago
- 📝The official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"☆22Dec 2, 2025Updated 4 months ago
- ICML2025☆64Aug 28, 2025Updated 7 months ago
- [CVPR 2025] TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing☆186May 22, 2025Updated 10 months ago
- EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Vi…☆78May 18, 2025Updated 10 months ago
- Official Implementation of Paper: WMPO: World Model-based Policy Optimization for Vision-Language-Action Models☆197Jan 4, 2026Updated 3 months ago
- [ICCV 2025] Video-T1: Test-Time Scaling for Video Generation☆309Mar 7, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text☆53Mar 16, 2025Updated last year
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆55May 8, 2025Updated 11 months ago
- ☆112Jan 8, 2025Updated last year
- A collection of awesome think with videos papers.☆97Dec 1, 2025Updated 4 months ago
- official repo for `thinking with images through-self-calling`☆26Dec 28, 2025Updated 3 months ago
- Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation☆27Dec 12, 2025Updated 4 months ago
- ☆37Feb 4, 2026Updated 2 months ago
- ☆67Feb 4, 2026Updated 2 months ago
- AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation☆39Feb 23, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- logit lens for VGGT☆27Dec 2, 2025Updated 4 months ago
- The official code of Yume☆644Jan 14, 2026Updated 3 months ago
- [CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding☆54Mar 16, 2026Updated last month
- Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence☆326Apr 9, 2026Updated last week
- This repository contains the code for the paper - "Aligning Text, Images, and 3D Structure Token-by-Token" (CVPR 2026)☆45Jun 11, 2025Updated 10 months ago
- ☆18Aug 21, 2025Updated 7 months ago
- 🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.☆166Mar 16, 2026Updated last month