remyxai / VQASynthLinks

Compose multimodal datasets 🎹

☆497

Alternatives and similar repositories for VQASynth

Users that are interested in VQASynth are comparing it to the libraries listed below

Sorting:

vision-x-nyu / thinking-in-space
Official repo and evaluation implementation of VSI-Bench
☆607Updated 2 months ago
AnjieCheng / SpatialRGPT
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆269Updated 10 months ago
embodied-generalist / embodied-generalist
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
☆464Updated 6 months ago
facebookresearch / open-eqa
OpenEQA Embodied Question Answering in the Era of Foundation Models
☆323Updated last year
InternRobotics / EmbodiedScan
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
☆634Updated 4 months ago
BAAI-DCAI / SpatialBot
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
☆312Updated last month
ZCMax / LLaVA-3D
[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
☆329Updated this week
TRI-ML / prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
☆830Updated last year
ZzZZCHS / Chat-Scene
Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)
☆195Updated 6 months ago
scene-verse / SceneVerse
Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"
☆265Updated 7 months ago
UMass-Embodied-AGI / 3D-VLA
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
☆579Updated 11 months ago
Gabesarch / grounded-rl
☆97Updated 3 months ago
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆718Updated last month
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆191Updated 5 months ago
diankun-wu / Spatial-MLLM
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆366Updated 4 months ago
liudaizong / Awesome-3D-Visual-Grounding
😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.
☆229Updated last week
deepcs233 / Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆387Updated 10 months ago
Open3DA / LL3DA
[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Langu…
☆308Updated last year
alibaba-damo-academy / WorldVLA
WorldVLA: Towards Autoregressive Action World Model
☆453Updated 2 weeks ago
LostXine / LLaRA
[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
☆225Updated 6 months ago
SilongYong / SQA3D
[ICLR 2023] SQA3D for embodied scene understanding and reasoning
☆149Updated 2 years ago
ATR-DBI / ScanQA
☆141Updated 2 years ago
behavior-vision-suite / behavior-vision-suite.github.io
☆168Updated 8 months ago
allenai / molmo
Code for the Molmo Vision-Language Model
☆779Updated 10 months ago
LaVi-Lab / Video-3D-LLM
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
☆168Updated 4 months ago
InternRobotics / Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
☆127Updated 9 months ago
bdaiinstitute / theia
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
☆255Updated 6 months ago
LatentActionPretraining / LAPA
[ICLR 2025] LAPA: Latent Action Pretraining from Videos
☆387Updated 9 months ago
liruiw / HPT
Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.
☆517Updated 10 months ago
YueFan1014 / VideoAgent
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
☆264Updated 10 months ago