danieljf24 / awesome-video-text-retrieval
A curated list of deep learning resources for video-text retrieval.
☆593Updated last year
Related projects ⓘ
Alternatives and complementary repositories for awesome-video-text-retrieval
- An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"☆881Updated 7 months ago
- [CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…☆705Updated last year
- An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"☆338Updated 3 months ago
- Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]☆350Updated 2 years ago
- Multi-Modal Transformer for Video Retrieval☆258Updated last month
- ☆231Updated last year
- [NeurIPS 2021] Moment-DETR code and QVHighlights dataset☆271Updated 7 months ago
- Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset☆261Updated 5 months ago
- Code for the HowTo100M paper☆252Updated 4 years ago
- The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matchi…☆401Updated 4 months ago
- Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"☆237Updated 2 years ago
- Recent Advances in Vision and Language Pre-training (VLP)☆288Updated last year
- Multi-modality pre-training☆471Updated 6 months ago
- [ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models☆295Updated 5 months ago
- Video embeddings for retrieval with natural language queries☆336Updated last year
- METER: A Multimodal End-to-end TransformER Framework☆362Updated 2 years ago
- This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP☆413Updated 2 years ago
- X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)☆449Updated last year
- [CVPR2023] All in One: Exploring Unified Video-Language Pre-training☆280Updated last year
- Recent Advances in Vision and Language PreTrained Models (VL-PTMs)☆1,140Updated 2 years ago
- Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"☆230Updated 3 years ago
- Align and Prompt: Video-and-Language Pre-training with Entity Prompts☆185Updated 2 years ago
- UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or …☆192Updated 7 months ago
- A Survey on multimodal learning research.☆315Updated last year
- code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022☆260Updated last month
- Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆239Updated 8 months ago
- An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"☆136Updated 7 months ago
- Code accompanying the paper "Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning".☆209Updated 4 years ago
- Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)☆226Updated last year
- [ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding☆322Updated 6 months ago