MRHiSum / MR.HiSum

☆25

Related projects ⓘ

Alternatives and complementary repositories for MR.HiSum

j-min / HiREST
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆91Updated last year
JacobChalk / TIM
Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"
☆37Updated this week
TengdaHan / AutoAD
[CVPR'23 Highlight] AutoAD: Movie Description in Context.
☆87Updated this week
fmu2 / snag_release
Official Implementation of SnAG (CVPR 2024)
☆35Updated 2 weeks ago
Soldelli / MAD
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
☆149Updated last year
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆44Updated 4 months ago
schowdhury671 / meerkat
☆18Updated last month
md-mohaiminul / ViS4mer
☆52Updated 2 years ago
wjun0830 / CGDETR
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Gr…
☆116Updated 2 months ago
klauscc / VindLU
☆101Updated last year
layer6ai-labs / xpool
https://layer6ai-labs.github.io/xpool/
☆114Updated last year
jinhyunj / EaTR
Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)
☆48Updated last year
ttgeng233 / UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆58Updated 9 months ago
medhini / Instructional-Video-Summarization
Code for paper, "TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency" ECCV 2022
☆36Updated last year
TencentYoutuResearch / HighlightDetection-CLC
Code for CVPR2023 paper "Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies"
☆17Updated last year
facebookresearch / EgoVLPv2
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
☆90Updated 4 months ago
farewellthree / STAN
Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"
☆98Updated 9 months ago
ninatu / everything_at_once
Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval". CVPR 2022
☆94Updated 2 years ago
linjieli222 / HERO_Video_Feature_Extractor
Video Feature Extraction Code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆97Updated 3 years ago
xuguohai / X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
☆136Updated 7 months ago
jayleicn / singularity
[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
☆130Updated last year
jssprz / video_captioning_datasets
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Pe…
☆116Updated last year
rikeilong / Bay-CAT
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆39Updated 2 months ago
LiuRicky / ts2_net
[ECCV2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
☆76Updated last year
HopLee6 / Sports-QA
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
☆28Updated 10 months ago
klauscc / TALLFormer
☆50Updated last year
GeWu-Lab / MUSIC-AVQA
MUSIC-AVQA, CVPR2022 (ORAL)
☆67Updated last year
HuiGuanLab / ms-sl
Source code of our MM'22 paper Partially Relevant Video Retrieval
☆51Updated last week
GenjiB / LAVISH
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆85Updated last year
mzhaoshuai / CenterCLIP
[SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast p…
☆125Updated 2 years ago