bruno686 / VisPlayLinks
☆40Updated 2 months ago
Alternatives and similar repositories for VisPlay
Users that are interested in VisPlay are comparing it to the libraries listed below
Sorting:
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated 2 months ago
- This is the offical repository of InfiniteVL☆76Updated last month
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆48Updated this week
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆77Updated 2 months ago
- ☆62Updated 2 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- ☆204Updated last month
- [NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…☆152Updated 4 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆206Updated 3 months ago
- A collection of awesome think with videos papers.☆83Updated last month
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- ☆63Updated 6 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Updated last year
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Updated 6 months ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆95Updated 8 months ago
- ☆61Updated 3 weeks ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆34Updated 2 months ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆93Updated 2 weeks ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆233Updated 5 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆74Updated last week
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆179Updated 7 months ago
- The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆140Updated 3 weeks ago
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆38Updated 3 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆38Updated last week
- ☆41Updated 7 months ago
- We introduce BabyVision, a benchmark revealing the infancy of AI vision.☆162Updated 2 weeks ago
- Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization☆307Updated 3 weeks ago
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆63Updated 4 months ago
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆37Updated 7 months ago
- More reliable Video Understanding Evaluation☆13Updated 4 months ago