This is the official repository for Retrieval Augmented Visual Question Answering
☆247Dec 19, 2024Updated last year
Alternatives and similar repositories for Retrieval-Augmented-Visual-Question-Answering
Users that are interested in Retrieval-Augmented-Visual-Question-Answering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our ACL-2023 paper: "Combo of Thinking and Observing for Outside-Knowledge VQA"☆12Jun 30, 2023Updated 2 years ago
- ☆30Dec 16, 2022Updated 3 years ago
- [EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.☆81Jan 19, 2026Updated 2 months ago
- Official implementation for the MM'22 paper.☆14Jun 30, 2022Updated 3 years ago
- Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…☆38Dec 19, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"☆70Jul 11, 2022Updated 3 years ago
- natual language guided image captioning☆87Feb 11, 2024Updated 2 years ago
- [NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering☆105Apr 6, 2025Updated 11 months ago
- MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering☆100Mar 30, 2023Updated 2 years ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆25May 30, 2024Updated last year
- Official Code of our AAAI-24 Paper: "Generative Multi-modal Knowledge Retrieval with Large Language Models".☆28Sep 15, 2025Updated 6 months ago
- [CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering☆54Jul 14, 2025Updated 8 months ago
- ☆68Oct 27, 2023Updated 2 years ago
- the code for paper: A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering☆13Aug 22, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆178Oct 1, 2024Updated last year
- An implementation of "M3DOCRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding" by Jaemin Cho, Debanj…☆52Nov 13, 2024Updated last year
- ☆14May 10, 2021Updated 4 years ago
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆129Sep 28, 2025Updated 5 months ago
- [COLING 2022] Learning from Adjective-Noun Pairs: A Knowledge-enhanced Framework for Target-Oriented Multimodal Sentiment Classification☆14Apr 19, 2023Updated 2 years ago
- Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".☆279Jun 14, 2025Updated 9 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆91Nov 15, 2024Updated last year
- Repo for the EMNLP 2023 paper "A Simple Knowledge-Based Visual Question Answering"☆25Dec 14, 2023Updated 2 years ago
- Retrieval-augmented Image Captioning☆13Feb 16, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆13Aug 14, 2022Updated 3 years ago
- [Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph☆72Feb 9, 2024Updated 2 years ago
- visual question answering prompting recipes for large vision-language models☆28Sep 14, 2024Updated last year
- [Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection☆25Feb 9, 2024Updated 2 years ago
- ☆18May 31, 2023Updated 2 years ago
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆38Oct 11, 2024Updated last year
- 基于 React + router + redux + axios 和 Flask + MySQL + Pytorch 的视觉问答管理系统☆10Dec 12, 2022Updated 3 years ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆416Apr 22, 2025Updated 11 months ago
- Code for WACV 2023 paper "VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge"☆21May 8, 2023Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Coarse-to-Fine Reasoning for Visual Question Answering (CVPRW'22)☆49Nov 3, 2022Updated 3 years ago
- Codes and Pre-trained models for RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [ACM MM 202…☆29Nov 2, 2023Updated 2 years ago
- ☆43Aug 15, 2023Updated 2 years ago
- ☆14Oct 14, 2019Updated 6 years ago
- ☆152Oct 12, 2022Updated 3 years ago
- [ECCV2022] Rethinking Data Augmentation for Robust Visual Question Answering☆13Nov 23, 2022Updated 3 years ago
- [ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale☆124Sep 2, 2024Updated last year