Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"
☆60Dec 18, 2025Updated 2 months ago
Alternatives and similar repositories for gca
Users that are interested in gca are comparing it to the libraries listed below
Sorting:
- [EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation☆31Jun 12, 2025Updated 8 months ago
- ☆22Sep 16, 2025Updated 5 months ago
- The code implementation for TTCS: Test-Time Curriculum Synthesis for Self-Evolving.☆32Feb 6, 2026Updated 3 weeks ago
- Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection☆22Feb 5, 2026Updated 3 weeks ago
- STAMP [Accepted by CVPR 2026]: Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask …☆33Feb 21, 2026Updated last week
- Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning☆30Jun 6, 2025Updated 8 months ago
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…☆21Oct 24, 2024Updated last year
- [NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆233Dec 16, 2025Updated 2 months ago
- [NeurIPS 2025] Official code for ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation☆33Oct 17, 2025Updated 4 months ago
- Training recipe for SpatialReasoner☆38Sep 21, 2025Updated 5 months ago
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Jul 10, 2023Updated 2 years ago
- Official implementation of "PyVision-RL: Forging Open Agentic Vision Models via RL."☆61Updated this week
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆67Jul 22, 2025Updated 7 months ago
- [CVPR'25] UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image☆38May 29, 2025Updated 9 months ago
- Paper: UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting☆31Jun 5, 2025Updated 8 months ago
- ☆38Jan 8, 2026Updated last month
- [ICLR 2025] Duoduo CLIP: Efficient 3D Understanding with Multi-View Images☆76May 29, 2025Updated 9 months ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆204Nov 28, 2025Updated 3 months ago
- ☆71Sep 27, 2022Updated 3 years ago
- ☆68Nov 5, 2025Updated 3 months ago
- ☆10May 19, 2025Updated 9 months ago
- [WACV 2025] D2FP: Learning Implicit Prior for Human Parsing☆16Mar 17, 2025Updated 11 months ago
- We introduce BabyVision, a benchmark revealing the infancy of AI vision.☆184Jan 13, 2026Updated last month
- Communication Relay by creating a WiFi Mesh Network using ROS, and using that network for Data Telemetry, with Telemetry radios ( Ubiquit…☆11Dec 18, 2018Updated 7 years ago
- Reimplementation of NeRF (Neural Radiance Fields) (ECCV2020)☆10May 4, 2023Updated 2 years ago
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆78Jan 21, 2026Updated last month
- [CVPR 2021] FMO Deblurring Benchmark☆13Jan 12, 2022Updated 4 years ago
- The sources codes of BIT.☆18Jun 13, 2025Updated 8 months ago
- Project focused on enhancing the quality of low-fidelity endoscopy images using Generative Adversarial Networks (GANs) implemented in PyT…☆17Jun 5, 2025Updated 8 months ago
- [TIP'24] Key-Axis-based Symmetry Axis Localization (Tags: rotational symmetry; rotation; symmetry; symmetry axis; pose estimation; 6DoF; …☆17Nov 29, 2025Updated 3 months ago
- This is a project on visual spatial reasoning tasks-SIBench☆25Jan 12, 2026Updated last month
- [ICCV 2025] Official PyTorch Code for "Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval"☆15Aug 23, 2025Updated 6 months ago
- code for paper "Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models"☆46Sep 21, 2023Updated 2 years ago
- [ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆78Updated this week
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆200Jun 4, 2025Updated 8 months ago
- Official implementation of the 2024 ECCV paper SHIC: Shape-Image Correspondences with no Keypoint Annotation☆39Oct 1, 2024Updated last year
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆63Jan 1, 2026Updated 2 months ago
- Chapter-wise notebooks for the book 'Practical Natural Language Processing'☆10Apr 21, 2020Updated 5 years ago
- This project implements a Wrinkle Detection application using YOLOv8 for segmentations. The application is built with Streamlit and allow…☆12Aug 14, 2024Updated last year