FoundationAgents / VR-BenchLinks
We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench shows that fine-tuned video models consistently outperform strong VLMs on long-horizon spatial planning tasks.
☆49Updated 3 weeks ago
Alternatives and similar repositories for VR-Bench
Users that are interested in VR-Bench are comparing it to the libraries listed below
Sorting:
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆89Updated 7 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- Official implementation of paper "Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models"☆64Updated 2 weeks ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆240Updated 3 months ago
- Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …☆57Updated this week
- 🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL☆57Updated 5 months ago
- [NeurIPS'25 Spotlight] ARM: Adaptive Reasoning Model☆64Updated 3 months ago
- ☆348Updated 6 months ago
- Official repository for the paper Number Cookbook: Number Understanding of Language Models and How to Improve It.☆19Updated 9 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last month
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 7 months ago
- [NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆53Updated 4 months ago
- ☆130Updated this week
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agents☆290Updated 2 months ago
- ☆16Updated 3 months ago
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆36Updated last month
- A Self-Training Framework for Vision-Language Reasoning☆88Updated last year
- [NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario☆29Updated 3 months ago
- Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆120Updated 2 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆146Updated 8 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆179Updated 7 months ago
- Discriminative Constrained Optimization for Reinforcing Large Reasoning Models☆50Updated 2 months ago
- Research works from Tencent AI Lab regarding self-evolving agents☆81Updated 5 months ago
- ☆43Updated 5 months ago
- Official Repository of LatentSeek☆76Updated 7 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆41Updated 8 months ago
- ☆62Updated 2 months ago
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆43Updated last year
- Official Repo for MageBench: Bridging Large Multimodal Models to Agents☆22Updated last year
- [ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality☆60Updated 6 months ago