YorkUCVIL / VTCD
☆13Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for VTCD
- Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]☆89Updated 6 months ago
- Large-Vocabulary Video Instance Segmentation dataset☆76Updated 4 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆24Updated 5 months ago
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆61Updated 7 months ago
- [ICLR 2024] Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models.☆56Updated 3 months ago
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆42Updated 2 months ago
- ☆71Updated last year
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆42Updated 3 months ago
- Official implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024☆43Updated 2 months ago
- [ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction☆169Updated 9 months ago
- [ECCV 2024 Best Paper Candidate] Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Vi…☆40Updated last month
- Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"☆81Updated last year
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆39Updated last year
- [NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization☆97Updated 9 months ago
- ☆45Updated 10 months ago
- Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders☆93Updated 3 months ago
- Official This-Is-My Dataset published in CVPR 2023☆15Updated 4 months ago
- Code for the paper Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models @ CVPR 2024☆57Updated 5 months ago
- Text-Image Alignment for Diffusion-based Perception (TADP) - CVPR 2024☆24Updated 2 months ago
- [CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".☆91Updated 3 months ago
- Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)☆37Updated 10 months ago
- A curated list of awesome self-supervised learning methods in videos☆114Updated this week
- Code release for "Language-conditioned Detection Transformer"☆85Updated 5 months ago
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024☆46Updated last month
- ☆36Updated 7 months ago
- ☆12Updated 8 months ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆51Updated 2 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆62Updated 6 months ago
- FreeDA: Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024)☆29Updated 2 months ago
- [WACV 2025] Official code for our paper "Enhancing Novel Object Detection via Cooperative Foundational Models"☆58Updated 3 weeks ago