RachanaJayaram / Cross-Attention-VizWiz-VQA
A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.
☆13Updated 9 months ago
Related projects: ⓘ
- ☆39Updated last year
- Code of Dense Relational Captioning☆67Updated last year
- [CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning☆89Updated 5 months ago
- The source code of ACL 2020 paper: "Cross-Modality Relevance for Reasoning on Language and Vision"☆26Updated 3 years ago
- Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020☆81Updated 4 years ago
- Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)☆64Updated 3 years ago
- Implementation for MAF: Multimodal Alignment Framework☆42Updated 3 years ago
- Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)☆38Updated last year
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Updated 2 years ago
- ROCK model for Knowledge-Based VQA in Videos☆30Updated 3 years ago
- ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration☆56Updated last year
- ☆65Updated last year
- A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning☆24Updated 4 years ago
- Implementation of paper "Improving Image Captioning with Better Use of Caption"☆32Updated 4 years ago
- ☆62Updated 2 years ago
- Code for paper "Adaptively Aligned Image Captioning via Adaptive Attention Time". NeurIPS 2019☆49Updated 4 years ago