TencentARC / UMTLinks

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

☆232

Alternatives and similar repositories for UMT

Users that are interested in UMT are comparing it to the libraries listed below

Sorting:

jayleicn / moment_detr
[NeurIPS 2021] Moment-DETR code and QVHighlights dataset
☆335Updated last year
wjun0830 / QD-DETR
Official pytorch repository for "QD-DETR : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection" (CVPR 2023 …
☆242Updated 3 months ago
wjun0830 / CGDETR
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Gr…
☆142Updated last year
layer6ai-labs / xpool
https://layer6ai-labs.github.io/xpool/
☆131Updated 2 years ago
microsoft / SwinBERT
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
☆246Updated 3 years ago
linjieli222 / HERO_Video_Feature_Extractor
Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆115Updated 4 years ago
TencentARC / MCQ
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).
☆141Updated 3 years ago
ju-chen / Efficient-Prompt
☆193Updated 3 years ago
foolwood / DRL
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval
☆97Updated 3 years ago
OpenGVLab / unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
☆341Updated last year
xuguohai / X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
☆178Updated last year
yawenzeng / Awesome-Cross-Modal-Video-Moment-Retrieval
前沿论文持续更新--视频时刻定位 or 时域语言定位 or 视频片段检索。
☆257Updated 2 years ago
LiuRicky / ts2_net
[ECCV 2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
☆78Updated 3 years ago
CryhanFang / CLIP2Video
☆256Updated 2 years ago
j-min / HiREST
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆107Updated 10 months ago
showlab / all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Updated 2 years ago
ttengwang / PDVC
End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
☆226Updated last year
m-bain / frozen-in-time
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
☆378Updated 3 years ago
OpenGVLab / efficient-video-recognition
☆180Updated 3 years ago
xyzforever / BEVT
PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529
☆164Updated 3 years ago
antoyang / TubeDETR
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
☆190Updated 2 years ago
sming256 / AdaTAD
[CVPR2024] The official implementation of AdaTAD: End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
☆40Updated last year
showlab / UniVTG
[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding
☆370Updated last year
Soldelli / MAD
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
☆169Updated 2 years ago
houzhijian / CONQUER
[2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval
☆43Updated 4 years ago
jssprz / video_captioning_datasets
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Pe…
☆131Updated 2 years ago
TXH-mercury / VALOR
[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
☆305Updated 11 months ago
r-cui / ViGA
"Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022
☆69Updated 3 years ago
26hzhang / ReLoCLNet
Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)
☆58Updated 4 years ago
farewellthree / STAN
Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"
☆107Updated last year