wengzejia1/Open-VCLIP

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wengzejia1/Open-VCLIP)

wengzejia1 / Open-VCLIP

☆120

Alternatives and similar repositories for Open-VCLIP

Users that are interested in Open-VCLIP are comparing it to the libraries listed below

Sorting:

Visual-AI / FROSTER
View on GitHub
[ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition
☆97Jan 14, 2025Updated last year
dmoltisanti / air-cvpr23
View on GitHub
This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…
☆13May 25, 2023Updated 2 years ago
wlin-at / MAXI
View on GitHub
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)
☆30Sep 5, 2023Updated 2 years ago
muzairkhattak / ViFi-CLIP
View on GitHub
[CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".
☆304Apr 3, 2024Updated last year
lucaspk512 / vrdone
View on GitHub
Official Implementation for ACM MM2024 paper "VrdONE: One-stage Video Visual Relation Detection".
☆11Nov 13, 2024Updated last year
leexinhao / ZeroI2V
View on GitHub
[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
☆20Jul 29, 2024Updated last year
farewellthree / STAN
View on GitHub
Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"
☆107Jan 28, 2024Updated 2 years ago
Sarinda251 / CDFSL-V
View on GitHub
Accepted at ICCV '23
☆15Oct 4, 2023Updated 2 years ago
haochenheheda / LVVIS
View on GitHub
Large-Vocabulary Video Instance Segmentation dataset
☆96Jul 5, 2024Updated last year
Francis-Rings / ILA
View on GitHub
[ICCV2023 Oral] Implicit Temporal Modeling with Learnable Alignment for Video Recognition
☆41Nov 29, 2023Updated 2 years ago
alibaba-mmai-research / DiST
View on GitHub
ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
☆41Sep 25, 2023Updated 2 years ago
TalalWasim / Vita-CLIP
View on GitHub
Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]
☆127Jul 1, 2023Updated 2 years ago
benedettaliberatori / T3AL
View on GitHub
Official implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024
☆70Sep 11, 2024Updated last year
Mia-YatingYu / STDD
View on GitHub
[AAAI'25]: Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP
☆19Aug 5, 2025Updated 7 months ago
HJYao00 / Side4Video
View on GitHub
☆42Apr 7, 2024Updated last year
whwu95 / Text4Vis
View on GitHub
【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
☆198May 30, 2024Updated last year
PolyU-ChenLab / ETBench
View on GitHub
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆74Jan 20, 2025Updated last year
ut-vision / ActionVOS
View on GitHub
[ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation
☆31Dec 4, 2024Updated last year
DeLightCMU / ElaborativeRehearsal
View on GitHub
This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)
☆36Apr 9, 2022Updated 3 years ago
OmkarThawakar / composed-video-retrieval
View on GitHub
Composed Video Retrieval
☆62May 2, 2024Updated last year
RAIVNLab / CREPE
View on GitHub
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Apr 27, 2023Updated 2 years ago
scofield7419 / MUIE-REAMO
View on GitHub
Code of the Grounded MUIE model, REAMO
☆11Dec 3, 2024Updated last year
KPeng9510 / Trans4SOAR
View on GitHub
☆14Apr 1, 2023Updated 2 years ago
tomchen-ctj / CVPR23-LOVEU-AQTC
View on GitHub
【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge
☆15Jul 18, 2023Updated 2 years ago
alibaba-mmai-research / CLIP-FSAR
View on GitHub
Code for our IJCV 2023 paper "CLIP-guided Prototype Modulating for Few-shot Action Recognition".
☆77Mar 7, 2024Updated 2 years ago
whwu95 / BIKE
View on GitHub
【CVPR'2023】Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
☆154Sep 9, 2024Updated last year
pulkitkumar95 / tats
View on GitHub
☆18Feb 20, 2025Updated last year
tomchen-ctj / OST
View on GitHub
【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
☆38Apr 27, 2024Updated last year
janghyuncho / DECOLA
View on GitHub
Code release for "Language-conditioned Detection Transformer"
☆88Jun 17, 2024Updated last year
ThomasWangY / 2024-AAAI-HPT
View on GitHub
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)
☆73Feb 3, 2025Updated last year
taoyang1122 / adapt-image-models
View on GitHub
[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition
☆300Sep 17, 2023Updated 2 years ago
engindeniz / vitis
View on GitHub
[ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
☆14Jan 13, 2025Updated last year
srijandas07 / clip_baseline_LTA_Ego4d
View on GitHub
Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)
☆15Jul 4, 2022Updated 3 years ago
DCDmllm / Momentor
View on GitHub
☆80Nov 24, 2024Updated last year
Shahzadnit / EZ-CLIP
View on GitHub
☆21May 11, 2025Updated 9 months ago
whwu95 / ATM
View on GitHub
【ICCV'2023】What Can Simple Arithmetic Operations Do for Temporal Modeling?
☆73Jan 26, 2024Updated 2 years ago
yuxiaochen1103 / FDT
View on GitHub
☆62Jun 16, 2023Updated 2 years ago
AndongDeng / BEAR
View on GitHub
BEAR: a new BEnchmark on video Action Recognition
☆46Apr 21, 2024Updated last year
HengLan / CGSTVG
View on GitHub
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆66Jun 28, 2024Updated last year