aranciokov / FSMMDA_VideoRetrievalLinks
☆10Updated 2 years ago
Alternatives and similar repositories for FSMMDA_VideoRetrieval
Users that are interested in FSMMDA_VideoRetrieval are comparing it to the libraries listed below
Sorting:
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Updated 3 years ago
- Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"☆13Updated 2 years ago
- ☆22Updated 3 years ago
- Recent Advances in Visual Dialog☆30Updated 3 years ago
- Implementation for the paper "Dynamic Language Binding in Relational Visual Reasoning" (Le et al., IJCAI 2020)☆13Updated last year
- ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration☆56Updated 2 years ago
- Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…☆10Updated last year
- ☆79Updated 3 years ago
- The official code for "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" (IEEE Access, 2021…☆17Updated 3 years ago
- Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"☆35Updated 3 years ago
- This repository contains code for the paper 'Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation'.☆18Updated 3 years ago
- [ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"☆49Updated 3 years ago
- ☆30Updated 3 years ago
- PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning☆89Updated 4 years ago
- Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"☆43Updated 4 years ago
- ☆108Updated 3 years ago
- Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"☆26Updated 3 years ago
- DeVLBert: Learning Deconfounded Visio-Linguistic Representations☆27Updated 3 years ago
- Video Graph Transformer for Video Question Answering (ECCV'22)☆49Updated 2 years ago
- Source code of our TCSVT'22 paper Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval☆19Updated 3 years ago
- CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment☆22Updated 3 years ago
- A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)☆43Updated 3 years ago
- Implementation for the paper "Reliable Visual Question Answering Abstain Rather Than Answer Incorrectly" (ECCV 2022: https//arxiv.org/abs…☆38Updated 2 years ago
- [EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction☆51Updated 3 years ago
- [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…☆77Updated 7 months ago
- [EMNLP'23 Oral] ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue PyTorch Implementation☆13Updated 2 years ago
- [Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection☆26Updated last year
- End-to-end Multi-modal Video Temporal Grounding, NeurIPS 2021☆18Updated 4 years ago
- ☆25Updated 3 years ago
- This repo contains code for Invariant Grounding for Video Question Answering☆27Updated 2 years ago