woven-visionai / wts-dataset
☆30Updated 6 months ago
Alternatives and similar repositories for wts-dataset:
Users that are interested in wts-dataset are comparing it to the libraries listed below
- [CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of t…☆30Updated last week
- ☆39Updated 7 months ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated this week
- [CVPR2024 Highlight] The official repo for paper "Abductive Ego-View Accident Video Understanding for Safe Driving Perception"☆42Updated last month
- ☆16Updated last year
- Code for our paper "Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers"☆35Updated last year
- Official implementation of the CVPR paper Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent …☆25Updated last year
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆88Updated last month
- A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆48Updated 8 months ago
- OvarNet official implement of the paper "OvarNet: Towards Open-vocabulary Object Attribute Recognition"☆98Updated last year
- ☆29Updated 3 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆37Updated last year
- [ECCV 2024] Beyond MOT: Semantic Multi-Object Tracking☆29Updated 5 months ago
- ☆17Updated 2 years ago
- Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)☆82Updated 2 months ago
- ☆29Updated 10 months ago
- (ICLR 2024, CVPR 2024) SparseFormer☆71Updated 3 months ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆36Updated last year
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆62Updated 6 months ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆55Updated 10 months ago
- [ICCV2023] AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception☆36Updated last year
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆43Updated 8 months ago
- A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-t…☆83Updated 4 months ago
- ☆61Updated last month
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆49Updated 3 months ago
- [NeurIPS 2024] Hawk: Learning to Understand Open-World Video Anomalies☆24Updated 4 months ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆68Updated 3 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆144Updated 4 months ago
- [CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detect…☆50Updated 4 months ago
- Official repository for the NuScenes-MQA. This paper is accepted by LLVA-AD Workshop at WACV 2024.☆25Updated last year