KAIST-Visual-AI-Group / APC-VLMLinks
[ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
☆56Updated 4 months ago
Alternatives and similar repositories for APC-VLM
Users that are interested in APC-VLM are comparing it to the libraries listed below
Sorting:
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…☆18Updated last year
- [CVPR 2025] Program synthesis for 3D spatial reasoning☆56Updated 7 months ago
- [CVPR 2025] GPS as a Control Signal for Image Generation☆25Updated 10 months ago
- Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".☆147Updated 4 months ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆60Updated 7 months ago
- ☆41Updated 8 months ago
- Spatial Aptitude Training for Multimodal Langauge Models☆24Updated this week
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆60Updated last month
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated 3 weeks ago
- ☆46Updated 5 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆79Updated this week
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 4 months ago
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World"☆124Updated last month
- ☆21Updated last year
- A Large-scale Video Action Dataset☆388Updated 3 weeks ago
- ☆68Updated 3 months ago
- ☆42Updated 7 months ago
- Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"☆49Updated last month
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆81Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- ☆63Updated last month
- Visual Spatial Tuning☆172Updated last week
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Updated 6 months ago
- [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆53Updated last year
- Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance☆46Updated 8 months ago
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆79Updated 2 weeks ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆57Updated 3 months ago
- ☆124Updated 3 months ago
- Training recipe for SpatialReasoner☆36Updated 4 months ago
- ☆38Updated last year