aiiu-lab / CLIPCAM
☆9Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for CLIPCAM
- ☆187Updated 2 years ago
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆108Updated last year
- SeqTR: A Simple yet Universal Network for Visual Grounding☆131Updated 2 weeks ago
- ☆104Updated 8 months ago
- Code for the paper, Temporal Action Localization with Enhanced Instant Discriminability☆20Updated 7 months ago
- CVPR 2023 Accepted Paper HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models☆56Updated 8 months ago
- A lightweight codebase for referring expression comprehension and segmentation☆52Updated 2 years ago
- ☆174Updated 2 years ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆98Updated 9 months ago
- A new framework for open-vocabulary object detection, based on maskrcnn-benchmark☆226Updated last year
- ☆77Updated 2 years ago
- Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.☆45Updated 2 years ago
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆93Updated last year
- ☆80Updated 2 years ago
- Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"☆83Updated 7 months ago
- Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022☆91Updated last year
- ☆169Updated 2 years ago
- An unofficial pytorch implementation of "TransVG: End-to-End Visual Grounding with Transformers".☆51Updated 3 years ago
- [ECCV2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval☆76Updated last year
- ☆35Updated 7 months ago
- ☆34Updated 2 years ago
- Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023☆35Updated 9 months ago
- ☆27Updated last year
- [ICCV 2023] Code for "Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement"☆138Updated 6 months ago
- [arXiv22] Disentangled Representation Learning for Text-Video Retrieval☆91Updated 2 years ago
- PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529☆158Updated 2 years ago
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆107Updated 3 months ago
- [ECCV 22] LocVTP: Video-Text Pre-training for Temporal Localization☆38Updated 2 years ago
- ☆70Updated last year
- super image for action recognition☆55Updated 2 years ago