We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench shows that fine-tuned video models consistently outperform strong VLMs on long-horizon spatial planning tasks.
☆65Feb 4, 2026Updated 4 months ago
Alternatives and similar repositories for VR-Bench
Users that are interested in VR-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Github repository for "Internalizing World Models via Self-Play Finetuning for Agentic RL"☆35Nov 1, 2025Updated 8 months ago
- On Policy Distillation Build on top of Verl☆87May 25, 2026Updated last month
- ☆19Jul 31, 2025Updated 11 months ago
- Official implementation of "TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards"☆30Apr 13, 2026Updated 2 months ago
- [ICML 2025] Official code of "DAMA: Data- and Model-aware Alignment of Multi-modal LLMs"☆16May 24, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆14Mar 4, 2022Updated 4 years ago
- ☆68Dec 11, 2025Updated 6 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated last year
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆229Apr 13, 2026Updated 2 months ago
- ☆13Feb 25, 2025Updated last year
- [ACL 2026]From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning☆40Updated this week
- Video URL transcriber and translator using AI. Download from Youtube and translate automatically by adding subtitles to the video☆21Nov 29, 2024Updated last year
- ☆20Jan 26, 2026Updated 5 months ago
- Environments by the Prime Intellect Research Team☆67Updated this week
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 🌟 SwarmAgent: A framework for simulating social group dynamics using multi-agent collaboration, aiding insights into collective behavior…☆13Dec 5, 2023Updated 2 years ago
- ChartSum is a large scale benchmark for automatic chart to text summarization☆11Jul 20, 2023Updated 2 years ago
- Modality Gap Theory☆74May 16, 2026Updated last month
- VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]☆17Jun 1, 2026Updated last month
- What does the bot say? ACL 2024☆28Aug 27, 2024Updated last year
- ☆113Dec 30, 2025Updated 6 months ago
- ☆15Jan 9, 2026Updated 5 months ago
- This is the source code of F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental …☆11Oct 19, 2024Updated last year
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark☆30Apr 4, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arc…☆16Feb 27, 2025Updated last year
- Universal memory runtime for AI agents☆51Jun 25, 2026Updated last week
- [ICLR'26] SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models☆40Mar 9, 2026Updated 3 months ago
- ☆219Dec 19, 2025Updated 6 months ago
- [CVPR 2023] Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning☆22Jun 11, 2023Updated 3 years ago
- ☆22Aug 18, 2024Updated last year
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆22Jul 2, 2024Updated 2 years ago
- ☆23Jul 23, 2025Updated 11 months ago
- Measuring RAG solutions throughput and latency☆20Jul 23, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Jupyter Hub Support in VS Code☆17Jun 16, 2026Updated 2 weeks ago
- An Enterprise LLM chat system using LibreChat, AWS Bedrock and LDAP/AD Authentication☆16Mar 5, 2026Updated 3 months ago
- ☆89Feb 5, 2026Updated 4 months ago
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- DataMosaic: Explainable and Verifiable Document-Based Data Analytics☆20Jun 30, 2025Updated last year
- A Split Tunneling Solution through Tailscale based on domain matching☆20Jan 8, 2026Updated 5 months ago
- A ZLE function that can create codex suggestions☆11Nov 30, 2022Updated 3 years ago