Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"
☆107Jan 28, 2024Updated 2 years ago
Alternatives and similar repositories for STAN
Users that are interested in STAN are comparing it to the libraries listed below
Sorting:
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆66Jun 7, 2024Updated last year
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Sep 25, 2023Updated 2 years ago
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆53Apr 9, 2024Updated last year
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆127Jul 1, 2023Updated 2 years ago
- ☆181Aug 20, 2022Updated 3 years ago
- ☆42Apr 7, 2024Updated last year
- ☆110Dec 23, 2022Updated 3 years ago
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated last year
- ☆120Feb 19, 2024Updated 2 years ago
- ☆85May 8, 2023Updated 2 years ago
- 【CVPR'2023】Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models☆154Sep 9, 2024Updated last year
- [AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning☆68Feb 16, 2024Updated 2 years ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 8 months ago
- VideoX: a collection of video cross-modal models☆1,061Jun 3, 2024Updated last year
- Source code of our MM'22 paper Partially Relevant Video Retrieval☆55Nov 4, 2024Updated last year
- 【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?☆256Nov 29, 2024Updated last year
- This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos☆19Mar 3, 2025Updated 11 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆40Mar 16, 2025Updated 11 months ago
- [CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆126Dec 28, 2024Updated last year
- [TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆306Dec 25, 2024Updated last year
- [ICCV 2023 Workshop] The Official Implementation of The First Prize Solution for RVOS Competition☆14Jan 1, 2024Updated 2 years ago
- ☆76Oct 22, 2022Updated 3 years ago
- [ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model☆140Apr 9, 2024Updated last year
- [arXiv22] Disentangled Representation Learning for Text-Video Retrieval☆98Apr 7, 2022Updated 3 years ago
- ☆21May 11, 2025Updated 9 months ago
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆107Jan 23, 2025Updated last year
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Jan 31, 2024Updated 2 years ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆66Jun 28, 2024Updated last year
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- Multi-modality pre-training☆510May 8, 2024Updated last year
- [WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"☆16Feb 24, 2025Updated last year
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024☆18Oct 11, 2024Updated last year
- Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…☆51May 29, 2024Updated last year
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆42Oct 19, 2025Updated 4 months ago
- (NeurIPS 2023) Open-set visual object query search & localization in long-form videos☆26Feb 1, 2024Updated 2 years ago
- ☆30Aug 14, 2023Updated 2 years ago
- [ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation☆31Dec 4, 2024Updated last year
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆132Nov 19, 2024Updated last year
- ☆26Jan 12, 2022Updated 4 years ago