arijitray1993 / SATLinks
Spatial Aptitude Training for Multimodal Langauge Models
☆19Updated 3 weeks ago
Alternatives and similar repositories for SAT
Users that are interested in SAT are comparing it to the libraries listed below
Sorting:
- Code for Stable Control Representations☆26Updated 7 months ago
- HD-EPIC Python script to download the entire datasets or parts of it☆14Updated last month
- ☆78Updated 6 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆58Updated last year
- [ECCV'24] 3D Reconstruction of Objects in Hands without Real World 3D Supervision☆16Updated 9 months ago
- ☆37Updated 9 months ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆86Updated 5 months ago
- ☆46Updated last year
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆29Updated last year
- Subtask-Aware Visual Reward Learning from Segmented Demonstrations (ICLR 2025 accepted)☆17Updated 7 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆77Updated 2 years ago
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆88Updated 5 months ago
- [CVPR 2023] Code for "3D Concept Learning and Reasoning from Multi-View Images"☆82Updated last year
- A paper list that includes world models or generative video models for embodied agents.☆25Updated 10 months ago
- ☆87Updated last year
- [ICCV 2023] Understanding 3D Object Interaction from a Single Image☆47Updated last year
- ☆18Updated last year
- Slot-TTA shows that test-time adaptation using slot-centric models can improve image segmentation on out-of-distribution examples.☆26Updated 2 years ago
- HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction☆42Updated 2 months ago
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆98Updated 3 weeks ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆78Updated 5 months ago
- Official Code for the NeurIPS'23 paper "3D-Aware Visual Question Answering about Parts, Poses and Occlusions"☆19Updated last year
- A unified robotic manipulation learning framework☆18Updated 2 months ago
- [ICLR 2025 Spotlight] Grounding Video Models to Actions through Goal Conditioned Exploration☆58Updated 6 months ago
- ☆21Updated last year
- Egocentric Video Understanding Dataset (EVUD)☆32Updated last year
- [TMLR 2025] The official repository of the paper "Unsupervised Discovery of Object-Centric Neural Fields"☆18Updated 9 months ago
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World"☆98Updated this week
- VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning☆93Updated last month
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆48Updated 2 months ago