quangminhdinh / TrafficVLM
[CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of the AI City Challenge 2024 Track 2.
☆29Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for TrafficVLM
- ☆29Updated 3 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 3 months ago
- Multimodal Video Understanding Framework (MVU)☆24Updated 6 months ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆51Updated 2 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆49Updated 2 months ago
- Language Repository for Long Video Understanding☆28Updated 5 months ago
- ☆55Updated 4 months ago
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"☆27Updated 9 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆49Updated 5 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆31Updated 2 months ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆82Updated 4 months ago
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆82Updated 3 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆24Updated last month
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated 2 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆61Updated last month
- ☆90Updated 6 months ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆33Updated last year
- Official implement of MIA-DPO☆40Updated 2 weeks ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆54Updated 7 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆55Updated 3 weeks ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆58Updated 3 weeks ago
- [ECCV 2024 Best Paper Candidate] Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Vi…☆40Updated last month
- [AAAI 2024] Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆69Updated 4 months ago
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆40Updated 4 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆22Updated 6 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆83Updated 4 months ago
- Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)☆73Updated 3 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆24Updated 5 months ago