sallymmx / m2clip
[AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition
☆25Updated 2 months ago
Related projects: ⓘ
- Frame Flexible Network (CVPR2023)☆52Updated last year
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆55Updated 5 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆20Updated last month
- A simple but efficient transformer model for video action recognition☆52Updated last year
- ☆100Updated 7 months ago
- ☆45Updated last year
- Official code for the paper: MAR: Masked Autoencoders for Efficient Action Recognition☆29Updated last year
- Improving Mamaba performance on Video Understanding task☆28Updated last month
- ☆58Updated this week
- [CVPR2024] UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity☆46Updated this week
- Video Test-Time Adaptation for Action Recognition (CVPR 2023)☆34Updated last year
- [CVPR 2023] Official PyTorch implementation of the paper "GAP: Post-Processing Temporal Action Detection"☆16Updated last year
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆91Updated 7 months ago
- OvarNet official implement of the paper "OvarNet: Towards Open-vocabulary Object Attribute Recognition"☆98Updated last year
- [T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection☆34Updated last year
- Code for Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID (CVPR 2024)☆29Updated 2 months ago
- UniMD: Towards Unifying Moment retrieval and temporal action Detection☆32Updated 2 months ago
- CVPR2024: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models☆54Updated 2 months ago
- [NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points☆38Updated 9 months ago
- [AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer☆52Updated 5 months ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆35Updated 11 months ago
- ☆35Updated last year
- CVPR2022 - Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation☆22Updated 2 years ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆62Updated 4 months ago
- [AAAI 2024] Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆62Updated 2 months ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval. Also, visualization and qb norm search for best performance…☆28Updated 5 months ago
- ☆12Updated 2 months ago
- Open-vocabulary Semantic Segmentation☆32Updated 7 months ago
- Code for the paper, Temporal Action Localization with Enhanced Instant Discriminability☆20Updated 5 months ago
- Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆34Updated 3 weeks ago