minghu0830 / NurViD-benchmark
☆18Updated 8 months ago
Related projects: ⓘ
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆38Updated 3 months ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆49Updated last week
- ☆67Updated last year
- ☆31Updated last year
- ☆19Updated last month
- ☆25Updated last year
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆55Updated 5 months ago
- ☆10Updated last week
- ☆32Updated last year
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆104Updated last year
- Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)☆62Updated 7 months ago
- ☆14Updated 9 months ago
- CVPR 2023 Accepted Paper HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models☆52Updated 6 months ago
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆17Updated 4 months ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval. Also, visualization and qb norm search for best performance…☆28Updated 5 months ago
- [CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆103Updated 5 months ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆31Updated last month
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆27Updated 5 months ago
- ☆34Updated 5 months ago
- (CVPR 2023) Official implemention of the paper "Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos…☆27Updated 5 months ago
- [ICLR2023] PLOT: Prompt Learning with Optimal Transport for Vision-Language Models☆137Updated 9 months ago
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆42Updated last year
- Composed Video Retrieval☆42Updated 4 months ago
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆46Updated last year
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆35Updated 11 months ago
- ☆60Updated last year
- Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]☆91Updated last year
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 6 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆23Updated 6 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆38Updated 2 months ago