WeizhenWang-1210 / MetaVQALinks
☆11Updated last month
Alternatives and similar repositories for MetaVQA
Users that are interested in MetaVQA are comparing it to the libraries listed below
Sorting:
- [CVPR2024 Highlight] The official repo for paper "Abductive Ego-View Accident Video Understanding for Safe Driving Perception"☆55Updated 3 months ago
- A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-t…☆101Updated 9 months ago
- Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)☆92Updated 7 months ago
- [ECCV 2024] Official implementation of C-Instructor: Controllable Navigation Instruction Generation with Chain of Thought Prompting☆24Updated 7 months ago
- Unified Vision-Language-Action Model☆128Updated 2 weeks ago
- ☆63Updated 11 months ago
- Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving☆89Updated last year
- [arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆43Updated last week
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆217Updated 7 months ago
- OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆41Updated this week
- Benchmark and model for step-by-step reasoning in autonomous driving.☆61Updated 4 months ago
- DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge☆71Updated this week
- [NeurIPS 2024] DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model☆69Updated 7 months ago
- [RSS 2024] Learning Manipulation by Predicting Interaction☆110Updated 2 weeks ago
- A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and A…☆229Updated this week
- [ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes☆126Updated 4 months ago
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆44Updated 3 weeks ago
- CoRL2024 | Hint-AD: Holistically Aligned Interpretability for End-to-End Autonomous Driving☆62Updated 8 months ago
- [ECCV 2024] Official GitHub repository for the paper "LingoQA: Visual Question Answering for Autonomous Driving"☆175Updated 9 months ago
- Doe-1: Closed-Loop Autonomous Driving with Large World Model☆98Updated 5 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆45Updated 3 weeks ago
- ☆45Updated last month
- Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"☆105Updated this week
- Code&Data for Grounded 3D-LLM with Referent Tokens☆123Updated 6 months ago
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆56Updated 4 months ago
- ☆13Updated last year
- [AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.☆197Updated 8 months ago
- The official Talk2Car dataset repo☆85Updated last month
- [ICCV 2025] Latent Motion Token as the Bridging Language for Robot Manipulation☆110Updated 2 months ago
- [NeurIPS 2024] Official code repository for MSR3D paper☆60Updated last month