quangminhdinh / TrafficVLM
[CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of the AI City Challenge 2024 Track 2.
☆28Updated 3 months ago
Related projects: ⓘ
- ☆25Updated last month
- Multimodal Video Understanding Framework (MVU)☆23Updated 4 months ago
- Language Repository for Long Video Understanding☆27Updated 3 months ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆31Updated 11 months ago
- ☆49Updated last month
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆20Updated 4 months ago
- Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.☆22Updated 8 months ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- ☆35Updated 11 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆42Updated 3 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆15Updated last week
- ☆29Updated 2 months ago
- ☆20Updated 4 months ago
- [CBMI2024] Official repository of the paper "Is CLIP the main roadblock for fine-grained open-world perception?".☆17Updated 2 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆39Updated last week
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆80Updated 2 months ago
- ☆80Updated 4 months ago
- This repository compiles a list of papers related to Video LLM.☆16Updated 2 months ago
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆68Updated last month
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning☆93Updated 2 months ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆99Updated 10 months ago
- Dense Connector for MLLMs☆98Updated last month
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆27Updated this week
- Data-Efficient Multimodal Fusion on a Single GPU☆45Updated 4 months ago
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆24Updated 3 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆17Updated 3 weeks ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆22Updated last week
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆45Updated last month
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆67Updated this week