linziyi96 / st-adapter
☆77Updated last year
Alternatives and similar repositories for st-adapter:
Users that are interested in st-adapter are comparing it to the libraries listed below
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆116Updated last year
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆77Updated 2 months ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Updated last year
- Video Test-Time Adaptation for Action Recognition (CVPR 2023)☆43Updated 6 months ago
- ☆39Updated last year
- ☆47Updated 2 years ago
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆97Updated last year
- ☆29Updated last year
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆99Updated last year
- CVPR 2023 Accepted Paper HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models☆63Updated last year
- Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]☆97Updated last year
- ☆193Updated 2 years ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- ☆92Updated last year
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆67Updated 5 months ago
- ☆61Updated last year
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆52Updated 9 months ago
- Official PyTorch code of GroundVQA (CVPR'24)☆59Updated 7 months ago
- SeqTR: A Simple yet Universal Network for Visual Grounding☆134Updated 5 months ago
- ☆112Updated last year
- ☆34Updated last year
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"☆32Updated last year
- Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)☆68Updated 2 months ago
- Code for our IJCV 2023 paper "CLIP-guided Prototype Modulating for Few-shot Action Recognition".☆61Updated last year
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆30Updated this week
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆55Updated 9 months ago
- Official Implementation of SnAG (CVPR 2024)☆44Updated 5 months ago
- Official code for the paper: MAR: Masked Autoencoders for Efficient Action Recognition☆31Updated 2 years ago
- ☆23Updated 2 years ago
- Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.☆47Updated 2 years ago