JerryYLi / svitt
Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"
☆18Updated last year
Related projects ⓘ
Alternatives and complementary repositories for svitt
- Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)☆29Updated 10 months ago
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated last year
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆61Updated 7 months ago
- ☆47Updated 2 years ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆39Updated last year
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆29Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆45Updated 5 months ago
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆70Updated 9 months ago
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024☆46Updated last month
- ☆39Updated 5 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆22Updated 6 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆22Updated 5 months ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆51Updated 2 months ago
- A Unified Framework for Video-Language Understanding☆56Updated last year
- Turning to Video for Transcript Sorting☆46Updated last year
- Code for CVPR2023 paper "Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies"☆17Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆22Updated 10 months ago
- ☆22Updated last year
- [ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, M…☆17Updated last month
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆54Updated last year
- ☆27Updated last year
- ☆54Updated 4 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- ☆41Updated 4 months ago
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆44Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆22Updated 4 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆49Updated 5 months ago
- ☆52Updated 2 years ago
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆15Updated 9 months ago