We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench shows that fine-tuned video models consistently outperform strong VLMs on long-horizon spatial planning tasks.
☆56Feb 4, 2026Updated last month
Alternatives and similar repositories for VR-Bench
Users that are interested in VR-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Github repository for "Internalizing World Models via Self-Play Finetuning for Agentic RL"☆33Nov 1, 2025Updated 4 months ago
- [TPAMI2025] BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors☆15Apr 23, 2025Updated 11 months ago
- A multi-agent framework to help with your homework.☆11Mar 1, 2025Updated last year
- [AAAI 2026] SIFThinker: Spatially-Aware Image Focus for Visual Reasoning☆23Dec 2, 2025Updated 3 months ago
- ☆60Dec 11, 2025Updated 3 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 9 months ago
- Video URL transcriber and translator using AI. Download from Youtube and translate automatically by adding subtitles to the video☆20Nov 29, 2024Updated last year
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- Modality Gap–Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models☆53Feb 23, 2026Updated last month
- 🔥 [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"☆26Feb 9, 2025Updated last year
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 6 months ago
- ☆16Jun 10, 2025Updated 9 months ago
- 🌟 SwarmAgent: A framework for simulating social group dynamics using multi-agent collaboration, aiding insights into collective behavior…☆13Dec 5, 2023Updated 2 years ago
- ☆97Dec 30, 2025Updated 2 months ago
- When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought☆27Feb 14, 2026Updated last month
- What does the bot say? ACL 2024☆27Aug 27, 2024Updated last year
- The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arc…☆15Feb 27, 2025Updated last year
- [ACMMM25] Crisp-sam2: Sam2 with cross-modal interaction and semantic prompting for multi-organ segmentation☆33Jul 6, 2025Updated 8 months ago
- [ECCV 2024] Teach CLIP to Develop a Number Sense for Ordinal Regression☆19Apr 1, 2025Updated 11 months ago
- ☆213Dec 19, 2025Updated 3 months ago
- [ICLR'26] Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?☆49Mar 9, 2026Updated 2 weeks ago
- [CVPR 2023] Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning☆22Jun 11, 2023Updated 2 years ago
- ☆20Jul 23, 2025Updated 8 months ago
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆21Jul 2, 2024Updated last year
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆29Jul 9, 2025Updated 8 months ago
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- AdaSociety is a customizable multi-agent environment featuring expanding state and action spaces, alongside explicit and alterable social…☆67Jul 1, 2025Updated 8 months ago
- A ZLE function that can create codex suggestions☆11Nov 30, 2022Updated 3 years ago
- Autonomous AI backend for deep research AI applications.☆42Mar 12, 2026Updated last week
- This is a framework for evaluating reasoning in foundational Video Models.☆81Mar 7, 2026Updated 2 weeks ago
- Sequential Parameter Optimization in Python☆14Jan 12, 2026Updated 2 months ago
- Official repository for the paper "Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning" and the SciEvo benchmark.☆53Jan 13, 2026Updated 2 months ago
- Claude Code agent that routes to external LLMs (Grok, Gemini, GPT-5, etc.) via OpenRouter - just mention the model name☆29Nov 30, 2025Updated 3 months ago
- Token-Oriented Object Notation☆27Nov 2, 2025Updated 4 months ago
- LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistry☆47Oct 9, 2025Updated 5 months ago
- Scaling Agentic Environments Automatically.☆54Jan 22, 2026Updated 2 months ago
- OLD Codebase for Intelligent Systems 2020 and Project AI, Vrije Universiteit Amsterdam☆12Jan 10, 2023Updated 3 years ago
- Measuring General Intelligence With Generated Games (Preprint)☆25Jul 30, 2025Updated 7 months ago
- Official implementation of CVPR2023 paper "Bi-directional distribution alignment for transductive zero-shot learning""☆34Apr 10, 2024Updated last year