remyxai / VQASynthView external linksLinks
Compose multimodal datasets πΉ
β545Jan 5, 2026Updated last month
Alternatives and similar repositories for VQASynth
Users that are interested in VQASynth are comparing it to the libraries listed below
Sorting:
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"β309Dec 14, 2024Updated last year
- Official repo and evaluation implementation of VSI-Benchβ670Aug 5, 2025Updated 6 months ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β335Sep 14, 2025Updated 5 months ago
- Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resourcesβ2,115Feb 3, 2026Updated last week
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMsβ59Jan 23, 2025Updated last year
- Official repository of Learning to Act from Actionless Videos through Dense Correspondences.β247Apr 25, 2024Updated last year
- Code for 3D-LLM: Injecting the 3D World into Large Language Modelsβ1,174Jun 6, 2024Updated last year
- A Vision-Language Model for Spatial Affordance Prediction in Roboticsβ213Jul 17, 2025Updated 6 months ago
- Dreamitate: Real-World Visuomotor Policy Learning via Video Generation (CoRL 2024)β58Jun 7, 2025Updated 8 months ago
- Code of 3DMIT: 3D MULTI-MODAL INSTRUCTION TUNING FOR SCENE UNDERSTANDINGβ31Jul 26, 2024Updated last year
- β432Nov 29, 2025Updated 2 months ago
- β12Jan 10, 2025Updated last year
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ372Oct 21, 2025Updated 3 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ435Feb 5, 2026Updated last week
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Modelsβ79Jan 21, 2026Updated 3 weeks ago
- Embodied Reasoning Question Answer (ERQA) Benchmarkβ258Mar 12, 2025Updated 11 months ago
- [TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.β139Mar 25, 2023Updated 2 years ago
- Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Gooβ¦β974Dec 20, 2025Updated last month
- VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Modelsβ783Feb 20, 2025Updated 11 months ago
- [ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Modelβ620Oct 29, 2024Updated last year
- A fork to add multimodal model training to open-r1β1,474Feb 8, 2025Updated last year
- OpenEQA Embodied Question Answering in the Era of Foundation Modelsβ340Sep 20, 2024Updated last year
- β150Aug 23, 2023Updated 2 years ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β3,754Nov 28, 2025Updated 2 months ago
- β78May 23, 2025Updated 8 months ago
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modelingβ4,234Sep 26, 2025Updated 4 months ago
- β4,562Sep 14, 2025Updated 5 months ago
- [NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understandingβ100Feb 2, 2025Updated last year
- [ICML 2024] LEO: An Embodied Generalist Agent in 3D Worldβ475Apr 20, 2025Updated 9 months ago
- [CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'β228Jun 18, 2024Updated last year
- OpenVLA: An open-source vision-language-action model for robotic manipulation.β5,251Mar 23, 2025Updated 10 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, β¦β203May 5, 2025Updated 9 months ago
- Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"β300Apr 22, 2024Updated last year
- β¨First Open-Source R1-like Video-LLM [2025/02/18]β381Feb 23, 2025Updated 11 months ago
- [CoRL 24 Oral] D^3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangementβ180Nov 2, 2024Updated last year
- Solve Visual Understanding with Reinforced VLMsβ5,841Oct 21, 2025Updated 3 months ago
- CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasksβ829Sep 8, 2025Updated 5 months ago
- [ECCV 2024] ShapeLLM: Universal 3D Object Understanding for Embodied Interactionβ225Oct 8, 2024Updated last year
- [RSS 2024] Learning Manipulation by Predicting Interactionβ118Jul 2, 2025Updated 7 months ago