woven-visionai / wts-dataset
☆30Updated 5 months ago
Alternatives and similar repositories for wts-dataset:
Users that are interested in wts-dataset are comparing it to the libraries listed below
- [CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of t…☆30Updated 7 months ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆35Updated last year
- Code for our paper "Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers"☆35Updated last year
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆35Updated last year
- Official implementation of the CVPR paper Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent …☆25Updated last year
- ☆36Updated 6 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆37Updated last week
- ☆16Updated 11 months ago
- [ECCV 2024] Beyond MOT: Semantic Multi-Object Tracking☆42Updated 2 months ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated 2 months ago
- [ICCV'23] Cascade-DETR: Delving into High-Quality Universal Object Detection☆96Updated last year
- [CVPR2024 Highlight] The official repo for paper "Abductive Ego-View Accident Video Understanding for Safe Driving Perception"☆40Updated 3 weeks ago
- A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆48Updated 7 months ago
- Code release for "Language-conditioned Detection Transformer"☆85Updated 7 months ago
- Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)☆80Updated last month
- Video Feature Enhancement with PyTorch☆25Updated last month
- ☆32Updated 7 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆27Updated 2 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆141Updated 3 months ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆42Updated 7 months ago
- 🚀【AAAI 2025】Cross-View Referring Multi-Object Tracking☆36Updated 3 weeks ago
- Open-vocabulary Semantic Segmentation☆35Updated 11 months ago
- ☆17Updated 2 years ago
- ☆25Updated 2 months ago
- Official repository for the NuScenes-MQA. This paper is accepted by LLVA-AD Workshop at WACV 2024.☆24Updated last year
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆65Updated 2 months ago
- [ICCV 2023] Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment☆43Updated last year
- [ECCV 2024] Beyond MOT: Semantic Multi-Object Tracking☆29Updated 4 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆32Updated 4 months ago