Jielin-Qiu / MMSum_model
[CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
☆30Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for MMSum_model
- [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…☆51Updated 4 months ago
- Video Graph Transformer for Video Question Answering (ECCV'22)☆44Updated last year
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆21Updated 4 months ago
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆91Updated last year
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆17Updated 8 months ago
- A PyTorch implementation of EmpiricalMVM☆39Updated 10 months ago
- Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023☆55Updated 2 weeks ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆50Updated 2 months ago
- The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)☆72Updated last year
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆129Updated 3 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆44Updated 4 months ago
- Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval". CVPR 2022☆94Updated 2 years ago
- Code for paper, "TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency" ECCV 2022☆36Updated last year
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆35Updated last year
- ☆37Updated 4 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆58Updated 4 months ago
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]☆90Updated 4 months ago
- [CVPR'23 Highlight] AutoAD: Movie Description in Context.☆87Updated last week
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆40Updated 4 months ago
- Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"☆46Updated last year
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval"☆11Updated 2 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 3 months ago
- Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)☆29Updated 10 months ago
- https://layer6ai-labs.github.io/xpool/☆114Updated last year
- ☆33Updated 10 months ago
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)☆58Updated 9 months ago
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆34Updated 8 months ago
- [2023 ACL] CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding☆28Updated last year
- "Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.☆64Updated 2 years ago
- Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…☆49Updated 5 months ago