PRIS-CV / Category-Specific-PromptLinks
Code release for "Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models"
☆12Updated last year
Alternatives and similar repositories for Category-Specific-Prompt
Users that are interested in Category-Specific-Prompt are comparing it to the libraries listed below
Sorting:
- ☆21Updated 11 months ago
- Official implementation of TagAlign☆35Updated 9 months ago
- ☆25Updated 2 years ago
- ☆58Updated 2 years ago
- [ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition☆88Updated 8 months ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆67Updated 11 months ago
- Disentangled Pre-training for Human-Object Interaction Detection☆25Updated last week
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated 2 years ago
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆26Updated 2 years ago
- [NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points☆46Updated last year
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆106Updated last year
- [CVPR 2024] TeachCLIP for Text-to-Video Retrieval☆39Updated 4 months ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆21Updated 7 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆36Updated 9 months ago
- ☆32Updated last year
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Updated last year
- Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models☆19Updated 3 months ago
- Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)☆16Updated 2 months ago
- ☆30Updated 2 years ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆40Updated 6 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆63Updated last year
- Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning☆20Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆27Updated last year
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆32Updated 2 years ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆43Updated 11 months ago
- [ICCV2023 Oral] Implicit Temporal Modeling with Learnable Alignment for Video Recognition☆40Updated last year
- Tracking with Human-Intent Reasoning☆72Updated 10 months ago
- Referring Image Segmentation Benchmarking with Segment Anything Model (SAM)☆38Updated 2 years ago
- The benchmark for "Video Object Segmentation in Panoptic Wild Scenes".☆12Updated last year
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆29Updated last year