A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.
☆15Dec 12, 2023Updated 2 years ago
Alternatives and similar repositories for Cross-Attention-VizWiz-VQA
Users that are interested in Cross-Attention-VizWiz-VQA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆30Mar 24, 2018Updated 8 years ago
- A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the Viz…☆15Jun 27, 2023Updated 2 years ago
- PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind P…☆64Oct 17, 2018Updated 7 years ago
- ☆12Jun 18, 2024Updated last year
- Weakly Supervised Grounding for VQA in Vision-Language Transformers☆16May 6, 2023Updated 3 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- An updated PyTorch implementation of hengyuan-hu's version for 'Bottom-Up and Top-Down Attention for Image Captioning and Visual Question…☆34Mar 13, 2026Updated 2 months ago
- Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT…☆21Oct 20, 2020Updated 5 years ago
- Local self-attention in Transformer for visual question answering☆13Mar 17, 2024Updated 2 years ago
- Implementation for the journal paper "DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering" (Jianyu et al., IEEE Tran…☆18Jun 22, 2021Updated 4 years ago
- Official code for the paper "Contrast and Classify: Training Robust VQA Models" published at ICCV, 2021☆19Jul 27, 2021Updated 4 years ago
- [Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection☆24Feb 9, 2024Updated 2 years ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆68Oct 11, 2021Updated 4 years ago
- A Chinese discourse parser based on CDTB☆14Jun 2, 2019Updated 7 years ago
- Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding☆33Aug 29, 2019Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- PyTorch implementation of L-GCN [https://arxiv.org/abs/2008.09105]☆25Apr 25, 2021Updated 5 years ago
- Multilabel Out-of-Distribution Detection☆10Nov 23, 2020Updated 5 years ago
- Generative Bias for Robust Visual Question Answering ( CVPR 2023 )☆29Jul 4, 2023Updated 2 years ago
- [NAACL 2022] TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding☆10Jul 15, 2023Updated 2 years ago
- Character Grounding and Re-Identification in Story of Videos and Text Descriptions☆10Jan 17, 2021Updated 5 years ago
- Code for ACL 2020 paper "Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA." Hyounghun Kim, Zineng T…☆34May 14, 2020Updated 6 years ago
- Repository to perform multi animal pose detection. In particular this code is used for bee pose estimation.☆10Jan 10, 2022Updated 4 years ago
- Chương trình dự đoán ngôn ngữ dựa vào văn bản(như Google Dịch ^^)☆12Aug 22, 2019Updated 6 years ago
- This is a code repository of Graphhopper: Multi-Hop Scene GraphReasoning for Visual Question Answering☆19Oct 30, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆11Oct 9, 2024Updated last year
- ☆10Aug 22, 2023Updated 2 years ago
- A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.☆20Feb 27, 2022Updated 4 years ago
- Memory-augmented Attention Modelling for Videos☆10Apr 24, 2017Updated 9 years ago
- ☆14Jul 13, 2021Updated 4 years ago
- EMNLP 2020: Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots☆12Dec 15, 2020Updated 5 years ago
- ☆17Dec 13, 2023Updated 2 years ago
- Background resampling for out-of-distribution detection☆13Mar 27, 2020Updated 6 years ago
- "What is Learned in Visually Grounded Neural Syntax Acquisition", Noriyuki Kojima, Hadar Averbuch-Elor, Alexander Rush and Yoav Artzi (AC…☆12Dec 30, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The code for reproducing "Frame Difference-Based Temporal Loss"☆11Sep 11, 2021Updated 4 years ago
- ☆13Jun 26, 2021Updated 4 years ago
- Leveraging Local and Global Patterns for Self-Attention Networks☆12Jun 3, 2019Updated 7 years ago
- video captioning using 3DCNN and LSTM (pytorch)☆11Sep 26, 2019Updated 6 years ago
- Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…☆13Jan 30, 2020Updated 6 years ago
- [NeurIPS 2019] Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries☆12Apr 15, 2022Updated 4 years ago
- Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations☆14Aug 16, 2022Updated 3 years ago