Spatial Aptitude Training for Multimodal Langauge Models
☆33Feb 8, 2026Updated 4 months ago
Alternatives and similar repositories for SAT
Users that are interested in SAT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code☆55Jun 6, 2026Updated last week
- ☆48Feb 18, 2026Updated 3 months ago
- This repository contains the source codes for the paper: "SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environm…☆16Oct 11, 2021Updated 4 years ago
- Implementation of Language-Conditioned Path Planning (Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James)☆27Sep 1, 2023Updated 2 years ago
- TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics☆21Nov 18, 2025Updated 6 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- 3D household task-based dataset created using customised AI2-THOR.☆14Apr 14, 2022Updated 4 years ago
- [ECCV'24] 3D Reconstruction of Objects in Hands without Real World 3D Supervision☆17Feb 3, 2025Updated last year
- A Vision-Language Model for Spatial Affordance Prediction in Robotics☆223Jul 17, 2025Updated 10 months ago
- A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation☆68Apr 1, 2025Updated last year
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…☆21Oct 24, 2024Updated last year
- Official implementation of StochSync: a zero-shot approach for image generation in arbitrary spaces via stochastic diffusion synchronizat…☆21Jun 24, 2025Updated 11 months ago
- Reading list for research topics in intuitive physics for artificial cognition.☆25May 17, 2022Updated 4 years ago
- ☆37Aug 25, 2025Updated 9 months ago
- policy with experience☆68Feb 25, 2026Updated 3 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Source code for EyeRobot☆46Dec 1, 2025Updated 6 months ago
- Code for CVPR 2024 Oral "Neural Lineage"☆17Jun 18, 2024Updated last year
- LogiCity@NeurIPS'24, D&B track. A multi-agent inductive learning environment for "abstractions".☆27Jun 10, 2025Updated last year
- Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning☆49Mar 25, 2026Updated 2 months ago
- Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”☆18Jan 27, 2026Updated 4 months ago
- ☆37Feb 24, 2026Updated 3 months ago
- Official Code for CVPR 2024 paper: Permutation Equivariance of Transformers and Its Applications.☆16Nov 12, 2024Updated last year
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆71Feb 28, 2024Updated 2 years ago
- Code release for SceneReplica paper.☆28Jul 24, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [EMNLP 2023 (Findings)] This repository contains data processing, evaluation, and fine-tuning code for NEWTON: Are Large Language Models …☆40Nov 13, 2024Updated last year
- [ACL 2026 Findings] "Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning"☆62May 26, 2026Updated 2 weeks ago
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World" (ICLR'26 Oral)☆169Apr 3, 2026Updated 2 months ago
- Code for ThriftyDAgger☆14Dec 29, 2021Updated 4 years ago
- [CCS'22] SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders☆18Jul 12, 2022Updated 3 years ago
- [CVPR 2025] Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"☆32Apr 10, 2025Updated last year
- Official implementation of AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories☆92Feb 17, 2026Updated 3 months ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆60Jan 23, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This is a repository for paper titled, PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Plann…☆14Nov 3, 2023Updated 2 years ago
- Implementation of Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins. [RSS 2025]☆54Oct 21, 2025Updated 7 months ago
- Integrating a custom robot (ANYmal C) with mjlab's velocity task☆71Feb 15, 2026Updated 3 months ago
- paper on dexpilot☆15Oct 14, 2019Updated 6 years ago
- ENACT is a benchmark that evaluates embodied cognition through world modeling from egocentric interaction. It is designed to be simple an…☆51Nov 27, 2025Updated 6 months ago
- Official Repository for MolmoAct☆369May 11, 2026Updated last month
- ☆62Aug 7, 2025Updated 10 months ago