quangminhdinh / TrafficVLM
[CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of the AI City Challenge 2024 Track 2.
☆34Updated 2 months ago
Alternatives and similar repositories for TrafficVLM:
Users that are interested in TrafficVLM are comparing it to the libraries listed below
- ☆34Updated 9 months ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆36Updated last year
- Official PyTorch code of GroundVQA (CVPR'24)☆59Updated 7 months ago
- ☆81Updated last week
- ☆95Updated 8 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆37Updated 7 months ago
- Visual self-questioning for large vision-language assistant.☆41Updated 6 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆84Updated 7 months ago
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆108Updated last month
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆71Updated 6 months ago
- 🤖 [ICLR'25] Multimodal Video Understanding Framework (MVU)☆36Updated 2 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆21Updated 3 weeks ago
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"☆32Updated last year
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆52Updated 9 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆59Updated 10 months ago
- Improving Mamaba performance on Video Understanding task☆39Updated 6 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆74Updated 6 months ago
- Video Feature Enhancement with PyTorch☆28Updated 4 months ago
- Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆105Updated last month
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 4 months ago
- [CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆61Updated last month
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆91Updated 4 months ago
- Dettoolchain: A new prompting paradigm to unleash detection ability of MLLM☆37Updated 6 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 8 months ago
- With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023☆17Updated 10 months ago
- ☆23Updated 9 months ago
- ☆23Updated 2 years ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆66Updated 2 months ago
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆89Updated 11 months ago
- ☆19Updated 6 months ago