TalalWasim / Vita-CLIPLinks

Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]

☆127

Alternatives and similar repositories for Vita-CLIP

Users that are interested in Vita-CLIP are comparing it to the libraries listed below

Sorting:

linziyi96 / st-adapter
☆84Updated 2 years ago
Visual-AI / FROSTER
[ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition
☆92Updated 10 months ago
ruc-aimc-lab / TeachCLIP
[CVPR 2024] TeachCLIP for Text-to-Video Retrieval
☆40Updated 6 months ago
HJYao00 / Side4Video
☆42Updated last year
bladewaltz1 / PromptSwitch
☆30Updated 2 years ago
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆63Updated last year
Ziyang412 / UCoFiA
Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
☆66Updated last year
jpthu17 / HBI
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
☆122Updated 11 months ago
wengzejia1 / Open-VCLIP
☆119Updated last year
park-jungin / DualPath
☆49Updated 3 years ago
Artanic30 / HOICLIP
CVPR 2023 Accepted Paper HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
☆68Updated last year
doc-doc / NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆83Updated last year
fmu2 / snag_release
Official Implementation of SnAG (CVPR 2024)
☆55Updated 7 months ago
alibaba-mmai-research / DiST
ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
☆41Updated 2 years ago
benedettaliberatori / T3AL
Official implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024
☆68Updated last year
ju-chen / Efficient-Prompt
☆193Updated 3 years ago
OmkarThawakar / composed-video-retrieval
Composed Video Retrieval
☆61Updated last year
LiuRicky / ts2_net
[ECCV 2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
☆78Updated 3 years ago
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆62Updated last year
ThomasWangY / 2024-AAAI-HPT
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)
☆73Updated 10 months ago
farewellthree / STAN
Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"
☆107Updated last year
IIGROUP / MAP
☆37Updated 3 years ago
ninatu / everything_at_once
Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval." CVPR 2022
☆115Updated 3 years ago
knightyxp / DGL
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
☆44Updated last year
jy0205 / STCAT
[NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
☆53Updated last year
chunmeifeng / SPRC
【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval
☆91Updated last year
Code-kunkun / ZS-CIR
[BMVC 2023] Zero-shot Composed Text-Image Retrieval
☆54Updated last year
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
j-min / HiREST
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆107Updated 10 months ago
ExplainableML / EgoCVR
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Updated 7 months ago