RachanaJayaram / Cross-Attention-VizWiz-VQALinks
A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.
☆15Updated last year
Alternatives and similar repositories for Cross-Attention-VizWiz-VQA
Users that are interested in Cross-Attention-VizWiz-VQA are comparing it to the libraries listed below
Sorting:
- ☆38Updated 2 years ago
- Code of Dense Relational Captioning☆69Updated 2 years ago
- Code for ACL 2020 paper "Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA." Hyounghun Kim, Zineng T…☆34Updated 5 years ago
- ☆44Updated 2 years ago
- Microsoft COCO Caption Evaluation Tool - Python 3☆33Updated 6 years ago
- The source code of ACL 2020 paper: "Cross-Modality Relevance for Reasoning on Language and Vision"☆27Updated 4 years ago
- A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning☆25Updated 4 years ago
- Implementation for MAF: Multimodal Alignment Framework☆46Updated 4 years ago
- Code for our ACL2021 paper: "Check It Again: Progressive Visual Question Answering via Visual Entailment"☆31Updated 3 years ago
- ROCK model for Knowledge-Based VQA in Videos☆30Updated 4 years ago
- Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)☆38Updated 2 years ago
- A reading list of papers about Visual Question Answering.☆32Updated 2 years ago
- ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration☆56Updated last year
- [CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning☆90Updated last year
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Updated 3 years ago
- Official code and dataset link for ''VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles''☆36Updated 3 years ago
- Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)☆65Updated 4 years ago
- ☆67Updated 2 years ago
- Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020☆80Updated 4 years ago
- Code for ViLBERTScore in EMNLP Eval4NLP☆18Updated 2 years ago
- Unpaired Image Captioning☆36Updated 4 years ago
- Controllable mage captioning model with unsupervised modes☆21Updated 2 years ago
- Implementation of paper "Improving Image Captioning with Better Use of Caption"☆32Updated 4 years ago
- An image-oriented evaluation tool for image captioning systems (EMNLP-IJCNLP 2019)☆38Updated 5 years ago
- [CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias☆121Updated 3 years ago
- This repository contains code used in our ACL'20 paper History for Visual Dialog: Do we really need it?☆34Updated 2 years ago
- ☆29Updated 2 years ago
- Implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering☆30Updated last year
- [ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering☆129Updated 2 years ago
- PyTorch code for ROLL, a knowledge-based video story question answering model.☆21Updated 4 years ago