danieljf24 / awesome-video-text-retrievalLinks

A curated list of deep learning resources for video-text retrieval.

☆638

Alternatives and similar repositories for awesome-video-text-retrieval

Users that are interested in awesome-video-text-retrieval are comparing it to the libraries listed below

Sorting:

ArrowLuo / CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
☆1,009Updated last year
jayleicn / moment_detr
[NeurIPS 2021] Moment-DETR code and QVHighlights dataset
☆335Updated last year
jayleicn / ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…
☆724Updated 2 years ago
microsoft / UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
☆363Updated last year
yawenzeng / Awesome-Cross-Modal-Video-Moment-Retrieval
前沿论文持续更新--视频时刻定位 or 时域语言定位 or 视频片段检索。
☆258Updated 2 years ago
CryhanFang / CLIP2Video
☆256Updated 2 years ago
microsoft / SwinBERT
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
☆246Updated 3 years ago
gabeur / mmt
Multi-Modal Transformer for Video Retrieval
☆264Updated last year
m-bain / frozen-in-time
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
☆378Updated 3 years ago
TencentARC / UMT
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or …
☆232Updated last year
albanie / collaborative-experts
Video embeddings for retrieval with natural language queries
☆342Updated 2 years ago
forence / Awesome-Visual-Captioning
This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP
☆414Updated 3 years ago
microsoft / XPretrain
Multi-modality pre-training
☆505Updated last year
Paranioar / Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretr…
☆435Updated 2 months ago
phellonchen / awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
☆296Updated 2 years ago
foolwood / DRL
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval
☆97Updated 3 years ago
antoine77340 / howto100m
Code for the HowTo100M paper
☆286Updated 5 years ago
showlab / all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Updated 2 years ago
zengyan-97 / X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
☆487Updated 3 years ago
jokieleung / awesome-visual-question-answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Common…
☆670Updated 2 years ago
ttengwang / PDVC
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
☆226Updated last year
OpenGVLab / unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
☆342Updated last year
linjieli222 / HERO_Video_Feature_Extractor
Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆116Updated 4 years ago
v-iashin / video_features
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and T…
☆631Updated 10 months ago
TheShadow29 / awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
☆1,123Updated 2 months ago
jssprz / video_captioning_datasets
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Pe…
☆131Updated 2 years ago
linjieli222 / HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆235Updated 4 years ago
TXH-mercury / VALOR
[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
☆305Updated 11 months ago
layer6ai-labs / xpool
https://layer6ai-labs.github.io/xpool/
☆131Updated 2 years ago
salesforce / ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188Updated 7 months ago