We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench shows that fine-tuned video models consistently outperform strong VLMs on long-horizon spatial planning tasks.
☆57Feb 4, 2026Updated 2 months ago
Alternatives and similar repositories for VR-Bench
Users that are interested in VR-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [TPAMI2025] BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors☆15Apr 23, 2025Updated 11 months ago
- ☆18Jul 31, 2025Updated 8 months ago
- Personalized Image Generation with Large Multimodal Models☆15May 13, 2025Updated 10 months ago
- [ICML 2025] Official code of "DAMA: Data- and Model-aware Alignment of Multi-modal LLMs"☆16May 24, 2025Updated 10 months ago
- ☆14Mar 4, 2022Updated 4 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆12Jun 20, 2023Updated 2 years ago
- ☆13Feb 25, 2025Updated last year
- v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning☆19Oct 6, 2025Updated 6 months ago
- [MICCAI 2025] Bridging the Gap in Missing Modalities: Leveraging Knowledge Distillation and Style Matching for Brain Tumor Segmentation☆19Jul 13, 2025Updated 8 months ago
- Official Implementation of DMT: Dual Mean-Teacher in PyTorch.☆10Oct 27, 2023Updated 2 years ago
- ☆19Jan 26, 2026Updated 2 months ago
- 🔥 [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"☆26Feb 9, 2025Updated last year
- ☆17Jun 10, 2025Updated 10 months ago
- More reliable Video Understanding Evaluation☆15Sep 23, 2025Updated 6 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Towards Accurate and Lightweight Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis☆16Mar 10, 2026Updated last month
- Code and dataset for the ICLR 2024 paper "Thought Propagation: An analogical Approach to Complex Reasoning with Large Language Models."☆16Mar 4, 2024Updated 2 years ago
- ChartSum is a large scale benchmark for automatic chart to text summarization☆11Jul 20, 2023Updated 2 years ago
- ☆99Dec 30, 2025Updated 3 months ago
- When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought☆29Feb 14, 2026Updated last month
- 同济大学数据挖掘课程期末作业:股票走势预测☆10Jan 11, 2021Updated 5 years ago
- ☆15Jan 9, 2026Updated 3 months ago
- [ECCV 2024] Teach CLIP to Develop a Number Sense for Ordinal Regression☆19Apr 1, 2025Updated last year
- [ACMMM25] Crisp-sam2: Sam2 with cross-modal interaction and semantic prompting for multi-organ segmentation☆33Jul 6, 2025Updated 9 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Code for Mind the Label Shift of Augmentation-based Graph OOD generalization (LiSA) in CVPR 2023. LiSA is a model-agnostic Graph OOD fram…☆16Jun 24, 2023Updated 2 years ago
- ☆217Dec 19, 2025Updated 3 months ago
- [ICLR'26] SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models☆38Mar 9, 2026Updated last month
- [ICLR'26] Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?☆51Mar 9, 2026Updated last month
- Measuring RAG solutions throughput and latency☆20Jul 23, 2024Updated last year
- ☆79Feb 5, 2026Updated 2 months ago
- ☆20Jul 23, 2025Updated 8 months ago
- Jupyter Hub Support in VS Code☆17Apr 2, 2026Updated last week
- An Enterprise LLM chat system using LibreChat, AWS Bedrock and LDAP/AD Authentication☆16Mar 5, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- DataMosaic: Explainable and Verifiable Document-Based Data Analytics☆20Jun 30, 2025Updated 9 months ago
- A Split Tunneling Solution through Tailscale based on domain matching☆18Jan 8, 2026Updated 3 months ago
- Official repository for the paper "Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning" and the SciEvo benchmark.☆44Jan 13, 2026Updated 2 months ago
- This is a framework for evaluating reasoning in foundational Video Models.☆87Apr 1, 2026Updated last week
- LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistry☆49Oct 9, 2025Updated 6 months ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- OLD Codebase for Intelligent Systems 2020 and Project AI, Vrije Universiteit Amsterdam☆12Jan 10, 2023Updated 3 years ago