MCG-NJU / ZeroI2V
[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
☆21Updated 8 months ago
Alternatives and similar repositories for ZeroI2V:
Users that are interested in ZeroI2V are comparing it to the libraries listed below
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆33Updated last year
- [T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection☆35Updated last year
- Data release for Step Differences in Instructional Video (CVPR24)☆13Updated 10 months ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆11Updated 10 months ago
- [CVPR 2024] Data and benchmark code for the EgoExoLearn dataset☆56Updated 7 months ago
- [ICCV 2023] MGMAE: Motion Guided Masking for Video Masked Autoencoding☆20Updated last year
- [TCSVT 2024] Temporally Consistent Referring Video Object Segmentation with Hybrid Memory☆17Updated 2 weeks ago
- [ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation☆31Updated 4 months ago
- ☆47Updated 2 years ago
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆27Updated 9 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Updated last year
- The 1st place solution of 2022 Ego4d Natural Language Queries.☆32Updated 2 years ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆29Updated 5 months ago
- Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"☆60Updated last year
- ☆58Updated last year
- Disentangled Pre-training for Human-Object Interaction Detection☆20Updated 5 months ago
- ☆61Updated last year
- The benchmark for "Video Object Segmentation in Panoptic Wild Scenes".☆12Updated last year
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Updated last year
- CVPR2022 - Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation☆23Updated 2 years ago
- MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)☆30Updated last year
- Code for ECCV2022 Paper "Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection"☆36Updated 2 years ago
- [ECCV 2024 Oral] SPLAM: Accelerating Image Generation with Sub-path Linear Approximation Model☆20Updated 5 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆36Updated 2 months ago
- [NeurIPS 2022] Official implementation of the paper "Rethinking Resolution in the Context of Efficient Video Recognition".☆31Updated 2 years ago
- ☆12Updated 9 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆27Updated last year
- Code for Point-Level Regin Contrast (https//arxiv.org/abs/2202.04639)☆35Updated 2 years ago
- OVAD: Open-vocabulary Attribute Detection code☆29Updated last year
- Official code for "Opening up Open World Tracking" (CVPR 2022)☆56Updated 2 years ago