aranciokov / FSMMDA_VideoRetrievalLinks
☆10Updated last year
Alternatives and similar repositories for FSMMDA_VideoRetrieval
Users that are interested in FSMMDA_VideoRetrieval are comparing it to the libraries listed below
Sorting:
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Updated 2 years ago
- Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"☆34Updated 2 years ago
- Recent Advances in Visual Dialog☆30Updated 3 years ago
- ☆22Updated 3 years ago
- Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…☆10Updated last year
- The official code for "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" (IEEE Access, 2021…☆17Updated 3 years ago
- [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…☆73Updated 5 months ago
- Implementation for the paper "Dynamic Language Binding in Relational Visual Reasoning" (Le et al., IJCAI 2020)☆13Updated last year
- ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration☆56Updated 2 years ago
- ☆79Updated 3 years ago
- [CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos☆35Updated 9 months ago
- ☆30Updated 2 years ago
- Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"☆13Updated 2 years ago
- Video Graph Transformer for Video Question Answering (ECCV'22)☆48Updated 2 years ago
- ☆27Updated 4 years ago
- [ACM MM 2021 Oral] Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation"☆40Updated 4 years ago
- Source code of our TCSVT'22 paper Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval☆19Updated 3 years ago
- DeVLBert: Learning Deconfounded Visio-Linguistic Representations☆27Updated 2 years ago
- ☆46Updated 3 years ago
- ☆40Updated 2 years ago
- Official implementation for the MM'22 paper.☆13Updated 3 years ago
- Implementation for the paper "Reliable Visual Question Answering Abstain Rather Than Answer Incorrectly" (ECCV 2022: https//arxiv.org/abs…☆38Updated 2 years ago
- ☆106Updated 3 years ago
- Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"☆26Updated 3 years ago
- This repository contains code for the paper 'Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation'.☆17Updated 3 years ago
- Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"☆43Updated 3 years ago
- End-to-end Multi-modal Video Temporal Grounding, NeurIPS 2021☆18Updated 4 years ago
- MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering☆99Updated 2 years ago
- PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning☆89Updated 4 years ago
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆115Updated 3 years ago