RachanaJayaram / Cross-Attention-VizWiz-VQALinks

A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.

☆15

Alternatives and similar repositories for Cross-Attention-VizWiz-VQA

Users that are interested in Cross-Attention-VizWiz-VQA are comparing it to the libraries listed below

Sorting:

Dong-JinKim / DenseRelationalCaptioning
Code of Dense Relational Captioning
☆69Updated 2 years ago
HLR / Cross_Modality_Relevance
The source code of ACL 2020 paper: "Cross-Modality Relevance for Reasoning on Language and Vision"
☆27Updated 4 years ago
alasdairtran / transform-and-tell
[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning
☆91Updated last year
noagarcia / knowit-rock
ROCK model for Knowledge-Based VQA in Videos
☆30Updated 4 years ago
jamespark3922 / adv-inf
Adversarial Inference for Multi-Sentence Video Descriptions (CVPR 2019)
☆34Updated 5 years ago
fenglinliu98 / MIA
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" （NeurIPS 2019）
☆65Updated 4 years ago
CCYChongyanChen / VQA_AlgorithmDatasets
☆38Updated 2 years ago
hyounghk / VideoQADenseCapFrameGate-ACL2020
Code for ACL 2020 paper "Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA." Hyounghun Kim, Zineng T…
☆34Updated 5 years ago
ezeli / BUTD_model
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
☆47Updated 3 years ago
fawazsammani / show-edit-tell
Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020
☆80Updated 4 years ago
qinzzz / Multimodal-Alignment-Framework
Implementation for MAF: Multimodal Alignment Framework
☆46Updated 4 years ago
Gitsamshi / WeakVRD-Captioning
Implementation of paper "Improving Image Captioning with Better Use of Caption"
☆32Updated 4 years ago
aioz-ai / ICCV19_VQA-CTI
Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)
☆38Updated 2 years ago
PhoebusSi / SAR
Code for our ACL2021 paper: "Check It Again: Progressive Visual Question Answering via Visual Entailment"
☆31Updated 3 years ago
yanxinzju / CSS-VQA
Counterfactual Samples Synthesizing for Robust VQA
☆78Updated 2 years ago
mad-red / VSR-guided-CIC
Human-like Controllable Image Captioning with Verb-specific Semantic Roles.
☆36Updated 3 years ago
MILVLG / mt-captioning
A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning
☆25Updated 4 years ago
Zhiquan-Wen / D-VQA
PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)
☆25Updated 2 years ago
husthuaan / AAT
Code for paper "Adaptively Aligned Image Captioning via Adaptive Attention Time". NeurIPS 2019
☆50Updated 5 years ago
YuanEZhou / Grounded-Image-Captioning
☆63Updated 3 years ago
CrossmodalGroup / SSL-VQA
Code for our IJCAI2020 paper: Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
☆51Updated 4 years ago
marcopede / AreasOfAttention
☆10Updated 7 years ago
entalent / MemCap
code for paper `MemCap: Memorizing Style Knowledge for Image Captioning`
☆11Updated 5 years ago
Sha-Lab / CMHSE
The code repository for "Cross-Modal and Hierarchical Modeling of Video and Text" in PyTorch
☆16Updated 6 years ago
wh0330 / CAG_VisDial
☆15Updated 4 years ago
ruotianluo / coco-caption
☆67Updated 2 years ago
lukemelas / image-paragraph-captioning
[EMNLP 2018] Training for Diversity in Image Paragraph Captioning
☆89Updated 5 years ago
SeleenaJM / CapEval
An image-oriented evaluation tool for image captioning systems (EMNLP-IJCNLP 2019)
☆38Updated 5 years ago
GeraldHan / GGE
Code for Greedy Gradient Ensemble for Visual Question Answering （ICCV 2021, Oral）
☆26Updated 3 years ago
zhegan27 / LXMERT-AdvTrain
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT…
☆21Updated 4 years ago