RachanaJayaram / Cross-Attention-VizWiz-VQALinks
A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.
☆15Updated 2 years ago
Alternatives and similar repositories for Cross-Attention-VizWiz-VQA
Users that are interested in Cross-Attention-VizWiz-VQA are comparing it to the libraries listed below
Sorting:
- The source code of ACL 2020 paper: "Cross-Modality Relevance for Reasoning on Language and Vision"☆27Updated 4 years ago
- Code of Dense Relational Captioning☆69Updated 2 years ago
- Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)☆65Updated 5 years ago
- Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT…☆21Updated 5 years ago
- Official code and dataset link for ''VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles''☆36Updated 4 years ago
- Implementation for MAF: Multimodal Alignment Framework☆46Updated 5 years ago
- Code for ACL 2020 paper "Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA." Hyounghun Kim, Zineng T…☆34Updated 5 years ago
- Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020☆81Updated 5 years ago
- ☆44Updated 6 months ago
- [CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning☆93Updated last year
- [EMNLP 2018] Training for Diversity in Image Paragraph Captioning☆91Updated 6 years ago
- An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER☆165Updated 3 years ago
- ☆68Updated 3 years ago
- Video captioning baseline models on Video2Commonsense Dataset.☆57Updated 4 years ago
- [ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset☆90Updated 2 years ago
- ROCK model for Knowledge-Based VQA in Videos☆31Updated 5 years ago
- Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER…☆119Updated 4 years ago
- [CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias☆127Updated 4 years ago
- Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)☆38Updated 3 years ago
- ☆64Updated 3 years ago
- Code for paper "Adaptively Aligned Image Captioning via Adaptive Attention Time". NeurIPS 2019☆51Updated 6 years ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Updated 3 years ago
- An image-oriented evaluation tool for image captioning systems (EMNLP-IJCNLP 2019)☆37Updated 5 years ago
- Code for our IJCAI2020 paper: Overcoming Language Priors with Self-supervised Learning for Visual Question Answering☆52Updated 5 years ago
- code for paper `MemCap: Memorizing Style Knowledge for Image Captioning`☆11Updated 5 years ago
- Adversarial Inference for Multi-Sentence Video Descriptions (CVPR 2019)☆34Updated 6 years ago
- [ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering☆132Updated 3 years ago
- Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)☆134Updated last year
- Implementation for "Large-scale Pretraining for Visual Dialog" https://arxiv.org/abs/1912.02379☆97Updated 5 years ago
- Counterfactual Samples Synthesizing for Robust VQA☆79Updated 3 years ago