Jason-Qiu / MMSum_model

[CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

☆30

Related projects: ⓘ

wxh1996 / VideoAgent
☆29Updated 2 months ago
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆34Updated last month
bcmi / Causal-VidQA
[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…
☆50Updated 2 months ago
scofield7419 / Video-of-Thought
Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆34Updated 2 months ago
medhini / Instructional-Video-Summarization
Code for paper, "TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency" ECCV 2022
☆34Updated last year
ByZ0e / Glance-Focus
This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)
☆19Updated 2 months ago
TXH-mercury / COSA
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
☆38Updated last year
j-min / HiREST
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆88Updated 11 months ago
sail-sg / VGT
Video Graph Transformer for Video Question Answering (ECCV'22)
☆44Updated last year
doc-doc / CoVGT
Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)
☆16Updated 6 months ago
aimagelab / pacscore
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023
☆51Updated last year
boheumd / A2Summ
The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)
☆68Updated last year
ninatu / everything_at_once
This is the official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval". CVPR 2022
☆93Updated 2 years ago
jpthu17 / DiCoSA
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
☆44Updated 5 months ago
showlab / mist
☆30Updated 9 months ago
tsujuifu / pytorch_empirical-mvm
A PyTorch implementation of EmpiricalMVM
☆39Updated 9 months ago
doc-doc / NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
☆120Updated last month
kevinliang888 / IVR-QA-baselines
[ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answers
☆11Updated 5 months ago
brown-palm / AntGPT
Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
☆18Updated 6 months ago
RERV / UniAdapter
[ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …
☆68Updated 7 months ago
kkahatapitiya / LangRepo
Language Repository for Long Video Understanding
☆27Updated 3 months ago
cwj1412 / MSCOCO-Flikcr30K_FG
Benchmark data for "Rethinking Benchmarks for Cross-modal Image-text Retrieval" (SIGIR 2023)
☆21Updated last year
StanfordVL / atp-video-language
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…
☆47Updated 3 months ago
acherstyx / CoCap
[ICCV 2023] Accurate and Fast Compressed Video Captioning
☆33Updated 7 months ago
Ziyang412 / UCoFiA
Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
☆50Updated 3 months ago
waybarrios / guidance-based-video-grounding
[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
☆15Updated last year
Becomebright / GroundVQA
Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.
☆49Updated last week
yanbeic / CCL
PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
☆86Updated 3 years ago
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆38Updated 3 months ago
UARK-AICV / VLTinT
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
☆64Updated 7 months ago