NVlabs / VideoITGLinks
☆65Updated this week
Alternatives and similar repositories for VideoITG
Users that are interested in VideoITG are comparing it to the libraries listed below
Sorting:
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆44Updated 3 weeks ago
- This repository is dedicated to Track 2 of the W-CODA 2024 Workshop, "Multimodal Perception and Comprehension of Corner Cases in Autonomo…☆11Updated last year
- [ICCV 2025] GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding☆65Updated 3 weeks ago
- ☆63Updated 11 months ago
- [NeurIPS 2024] DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model☆70Updated 7 months ago
- [ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes☆126Updated 4 months ago
- [CVPR 2025 Highlight🔥] Official code repository for "Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuni…☆93Updated 2 months ago
- Official PyTorch implementation of GeoDiffusion in ICLR 2024 (https://arxiv.org/abs/2306.04607)☆90Updated 6 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆98Updated 3 months ago
- The first decoder-only multimodal state space model☆92Updated 2 months ago
- Doe-1: Closed-Loop Autonomous Driving with Large World Model☆98Updated 5 months ago
- officical code for ECCV 2024 paper "Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection"☆14Updated last year
- Code&Data for Grounded 3D-LLM with Referent Tokens☆123Updated 6 months ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆23Updated last month
- Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)☆92Updated 7 months ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆138Updated last month
- ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving☆111Updated last month
- [AAAI 2024] Mono3DVG: 3D Visual Grounding in Monocular Images, AAAI, 2024☆58Updated last year
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆69Updated last week
- [CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers☆98Updated 2 weeks ago
- Official code of DMA: Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding, ECCV 2024☆29Updated last year
- Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"☆68Updated 2 weeks ago
- Official Code Release of Delphi☆54Updated last year
- [ECCV 2024] WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation☆107Updated 5 months ago
- [CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D De…☆96Updated 11 months ago
- [CVPR2024] Official Repository of Paper "Panacea: Panoramic and Controllable Video Generation for Autonomous Driving"☆234Updated 11 months ago
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆77Updated 9 months ago
- Code for CVPR2025 paper: Generating Multimodal Driving Scenes via Next-Scene Prediction☆70Updated 4 months ago
- ☆16Updated last year
- the official code of DriveMonkey☆29Updated last month