muzairkhattak / ViFi-CLIP
[CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".
☆248Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for ViFi-CLIP
- [ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models☆295Updated 5 months ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆260Updated 10 months ago
- [CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".☆226Updated 5 months ago
- ☆169Updated 2 years ago
- ☆187Updated 2 years ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆245Updated 10 months ago
- [CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv…☆110Updated last year
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆178Updated 10 months ago
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆108Updated last year
- An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"☆136Updated 7 months ago
- Foundation Models for Video Understanding: A Survey☆97Updated 2 months ago
- [ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer☆294Updated 7 months ago
- [CVPR2023] All in One: Exploring Unified Video-Language Pre-training☆280Updated last year
- Official Open Source code for "Scaling Language-Image Pre-training via Masking"☆407Updated last year
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆186Updated 10 months ago
- ☆106Updated 9 months ago
- [CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》☆149Updated last year
- Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]☆89Updated 6 months ago
- The suite of modeling video with Mamba☆238Updated 6 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆297Updated 4 months ago
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆298Updated 5 months ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆98Updated 9 months ago
- [NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary …☆284Updated 2 years ago
- Official Implementation of "The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval"☆46Updated this week
- Awesome papers & datasets specifically focused on long-term videos.☆212Updated this week
- [CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding☆289Updated 5 months ago
- ☆289Updated 9 months ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding☆244Updated 4 months ago
- OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.☆187Updated last month
- Densely Captioned Images (DCI) dataset repository.☆159Updated 4 months ago