WeizhenWang-1210 / MetaVQALinks

☆11

Alternatives and similar repositories for MetaVQA

Users that are interested in MetaVQA are comparing it to the libraries listed below

Sorting:

jeffreychou777 / LOTVS-MM-AU
[CVPR2024 Highlight] The official repo for paper "Abductive Ego-View Accident Video Understanding for Safe Driving Perception"
☆55Updated 3 months ago
YuanJianhao508 / RAG-Driver
A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-t…
☆101Updated 9 months ago
DLUT-LYZ / CODA-LM
Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)
☆92Updated 7 months ago
refkxh / C-Instructor
[ECCV 2024] Official implementation of C-Instructor: Controllable Navigation Instruction Generation with Chain of Thought Prompting
☆24Updated 7 months ago
baaivision / UniVLA
Unified Vision-Language-Action Model
☆128Updated 2 weeks ago
xmed-lab / NuInstruct
☆63Updated 11 months ago
fudan-zvg / Reason2Drive
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
☆89Updated last year
OpenRobotLab / MMSI-Bench
[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
☆43Updated last week
AnjieCheng / SpatialRGPT
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆217Updated 7 months ago
OpenRobotLab / OST-Bench
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
☆41Updated this week
ayesha-ishaq / DriveLMM-o1
Benchmark and model for step-by-step reasoning in autonomous driving.
☆61Updated 4 months ago
Zhangwenyao1 / DreamVLA
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
☆71Updated this week
Robertwyq / Drivingdojo
[NeurIPS 2024] DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model
☆69Updated 7 months ago
OpenDriveLab / MPI
[RSS 2024] Learning Manipulation by Predicting Interaction
☆110Updated 2 weeks ago
leofan90 / Awesome-World-Models
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and A…
☆229Updated this week
jxbbb / TOD3Cap
[ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
☆126Updated 4 months ago
Haochen-Wang409 / ross3d
Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".
☆44Updated 3 weeks ago
Robot-K / Hint-AD
CoRL2024 | Hint-AD: Holistically Aligned Interpretability for End-to-End Autonomous Driving
☆62Updated 8 months ago
wayveai / LingoQA
[ECCV 2024] Official GitHub repository for the paper "LingoQA: Visual Question Answering for Autonomous Driving"
☆175Updated 9 months ago
wzzheng / Doe
Doe-1: Closed-Loop Autonomous Driving with Large World Model
☆98Updated 5 months ago
GLUS-video / GLUS
[CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…
☆45Updated 3 weeks ago
XiandaGuo / Drive-MLLM
☆45Updated last month
OpenRobotLab / StreamVLN
Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"
☆105Updated this week
OpenRobotLab / Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
☆123Updated 6 months ago
linkangheng / Video-UTR
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
☆56Updated 4 months ago
zhangzaibin / AD-H
☆13Updated last year
qiantianwen / NuScenes-QA
[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
☆197Updated 8 months ago
talk2car / Talk2Car
The official Talk2Car dataset repo
☆85Updated last month
TencentARC / Moto
[ICCV 2025] Latent Motion Token as the Bridging Language for Robot Manipulation
☆110Updated 2 months ago
MSR3D / MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
☆60Updated last month