visgym / VisGymLinks
Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
☆63Updated this week
Alternatives and similar repositories for VisGym
Users that are interested in VisGym are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated last week
- ☆114Updated 6 months ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆45Updated 2 years ago
- Code and data for "Does Spatial Cognition Emerge in Frontier Models?"☆27Updated 9 months ago
- Official implementation of "Self-Improving Video Generation"☆78Updated 9 months ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆98Updated 2 weeks ago
- ☆118Updated 2 months ago
- Spatial Aptitude Training for Multimodal Langauge Models☆23Updated this week
- ☆78Updated 8 months ago
- ☆162Updated last year
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆58Updated last year
- [CVPR 2025] Program synthesis for 3D spatial reasoning☆54Updated 7 months ago
- ☆46Updated last year
- ☆61Updated 3 weeks ago
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆125Updated 2 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆33Updated last year
- ☆38Updated 11 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆74Updated last week
- ☆41Updated 7 months ago
- Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation☆27Updated last month
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆47Updated 6 months ago
- ☆66Updated 2 months ago
- This repository is a collection of research papers on World Models.☆43Updated 2 years ago
- ☆80Updated 7 months ago
- [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆52Updated last year
- Egocentric Video Understanding Dataset (EVUD)☆32Updated last year
- LogiCity@NeurIPS'24, D&B track. A multi-agent inductive learning environment for "abstractions".☆27Updated 7 months ago
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆38Updated 3 months ago
- ElasticTok: Adaptive Tokenization for Image and Video☆87Updated last year
- Evaluate Multimodal LLMs as Embodied Agents☆57Updated 11 months ago