KAIST-Visual-AI-Group/APC-VLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/KAIST-Visual-AI-Group/APC-VLM)

KAIST-Visual-AI-Group / APC-VLM

[ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation

☆66

Alternatives and similar repositories for APC-VLM

Users that are interested in APC-VLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

KAIST-Visual-AI-Group / VG-AVS
View on GitHub
Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection
☆24Feb 5, 2026Updated 5 months ago
kaist-cvml / geometric-distillation
View on GitHub
[EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
☆39Jun 12, 2025Updated last year
sled-group / COMFORT
View on GitHub
[ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…
☆22Oct 24, 2024Updated last year
KAIST-Visual-AI-Group / GrounDiT
View on GitHub
[NeurIPS 2024] Official Implementation of GrounDiT
☆59Dec 12, 2024Updated last year
STARE-bench / STARE
View on GitHub
☆19Oct 12, 2025Updated 9 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
InternRobotics / MMSI-Bench
View on GitHub
[ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
☆103Apr 28, 2026Updated 2 months ago
stogiannidis / srbench
View on GitHub
Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"
☆19Feb 1, 2026Updated 5 months ago
mengcaopku / SpatialDreamer
View on GitHub
SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery
☆15Feb 1, 2026Updated 5 months ago
KAIST-Visual-AI-Group / Token-Warping-MLLM
View on GitHub
☆22Mar 31, 2026Updated 3 months ago
KAIST-Visual-AI-Group / StochSync
View on GitHub
Official implementation of StochSync: a zero-shot approach for image generation in arbitrary spaces via stochastic diffusion synchronizat…
☆21Jun 24, 2025Updated last year
Chenyu-Wang567 / All-Angles-Bench
View on GitHub
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
☆70Mar 22, 2026Updated 4 months ago
KAIST-Visual-AI-Group / PDS
View on GitHub
Official Implementation of Posterior Distillation Sampling
☆94Jul 7, 2025Updated last year
KAIST-Visual-AI-Group / ORIGEN
View on GitHub
[NeurIPS 2025] Official code for ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
☆32Oct 17, 2025Updated 9 months ago
mll-lab-nu / Theory-of-Space
View on GitHub
THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…
☆85Feb 27, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
cheolhong0916 / contrastive-probing
View on GitHub
☆15Jun 19, 2026Updated last month
UMass-Embodied-AGI / MindJourney
View on GitHub
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
☆151Nov 4, 2025Updated 8 months ago
KAIST-Visual-AI-Group / SyncTweedies
View on GitHub
Official implementation of SyncTweedies: A General Generative Framework Based on Synchronized Diffusions (NeurIPS 2024)
☆69Aug 4, 2024Updated last year
hunarbatra / SpatialThinker
View on GitHub
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
☆40Jan 28, 2026Updated 5 months ago
NVlabs / RoboSpatial
View on GitHub
☆147Jun 17, 2026Updated last month
KAIST-Visual-AI-Group / PartSTAD
View on GitHub
Official implementation of PartSTAD: 2D-to-3D Part Segmentation Task Adaptation (ECCV 2024).
☆56Nov 7, 2024Updated last year
AnjieCheng / SpatialRGPT
View on GitHub
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆335Dec 14, 2024Updated last year
mll-lab-nu / MindCube
View on GitHub
☆163Mar 23, 2026Updated 4 months ago
Ugness / ReDi
View on GitHub
Official implementation of ReDi: Rectified Discrete Flow (NeurIPS 2025)
☆18May 11, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
KAIST-Visual-AI-Group / Flow-Inference-Time-Scaling
View on GitHub
[NeurIPS 2025] Official code for Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
☆75Oct 12, 2025Updated 9 months ago
facebookresearch / Multi-SpatialMLLM
View on GitHub
[CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
☆178Feb 25, 2026Updated 4 months ago
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
AntResearchNLP / ViLaSR
View on GitHub
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆98Jul 27, 2025Updated 11 months ago
zhangzaibin / spagent
View on GitHub
SPAgent, a foundation agent for understanding, reasoning over, and operating within the physical and spatial world.
☆205Updated this week
fereenwong / cdViews
View on GitHub
official code for "3D Question Answering via only 2D Vision-Language Models"
☆24Mar 4, 2026Updated 4 months ago
hyungjin-chung / VPS
View on GitHub
☆16Sep 11, 2025Updated 10 months ago
qizekun / OmniSpatial
View on GitHub
[ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
☆88Jan 21, 2026Updated 6 months ago
KAIST-Visual-AI-Group / PairFlow
View on GitHub
[ICLR 2026] Official code for PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models
☆16Jul 3, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Haochen-Wang409 / TreeVGR
View on GitHub
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
☆92Jan 26, 2026Updated 5 months ago
arijitray1993 / SAT
View on GitHub
Spatial Aptitude Training for Multimodal Langauge Models
☆33Feb 8, 2026Updated 5 months ago
KAIST-Visual-AI-Group / MatLat
View on GitHub
[CVPR 2026 Highlight] Official code for MatLat: Material Latent Space for PBR Texture Generation
☆17Jul 16, 2026Updated last week
THU-SI / Spatial-MLLM
View on GitHub
[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆480Feb 5, 2026Updated 5 months ago
shiqichen17 / AdaptVis
View on GitHub
Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)
☆76May 2, 2025Updated last year
Visual-AI / 3DRS
View on GitHub
[NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
☆158Dec 9, 2025Updated 7 months ago
InternRobotics / OV_PARTS
View on GitHub
[NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentation
☆95Jun 24, 2024Updated 2 years ago