LinWeizheDragon/Retrieval-Augmented-Visual-Question-Answering

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LinWeizheDragon/Retrieval-Augmented-Visual-Question-Answering)

LinWeizheDragon / Retrieval-Augmented-Visual-Question-Answering

This is the official repository for Retrieval Augmented Visual Question Answering

☆252

Alternatives and similar repositories for Retrieval-Augmented-Visual-Question-Answering

Users that are interested in Retrieval-Augmented-Visual-Question-Answering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LinWeizheDragon / FLMR
View on GitHub
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
☆108May 30, 2025Updated last year
PhoebusSi / Thinking-while-Observing
View on GitHub
Code for our ACL-2023 paper: "Combo of Thinking and Observing for Outside-Knowledge VQA"
☆12Jun 30, 2023Updated 3 years ago
jialinwu17 / MAVEX
View on GitHub
☆30Dec 16, 2022Updated 3 years ago
guoyang9 / UnifER
View on GitHub
Official implementation for the MM'22 paper.
☆14Jun 30, 2022Updated 4 years ago
Go2Heart / EchoSight
View on GitHub
[EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.
☆90Jan 19, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
PaulLerner / ViQuAE
View on GitHub
Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…
☆39Dec 19, 2024Updated last year
guilk / KAT
View on GitHub
Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"
☆71Jul 11, 2022Updated 4 years ago
Yushi-Hu / PromptCap
View on GitHub
natual language guided image captioning
☆89Feb 11, 2024Updated 2 years ago
AndersonStra / MuKEA
View on GitHub
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
☆101Mar 30, 2023Updated 3 years ago
aimagelab / ReflectiVA
View on GitHub
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
☆56Jul 14, 2025Updated last year
yuanze-lin / REVIVE
View on GitHub
[NeurIPS 2022] Official code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
☆105Apr 6, 2025Updated last year
xinwei666 / MMGenerativeIR
View on GitHub
Official Code of our AAAI-24 Paper: "Generative Multi-modal Knowledge Retrieval with Large Language Models".
☆28Sep 15, 2025Updated 10 months ago
open-vision-language / infoseek
View on GitHub
☆78Oct 27, 2023Updated 2 years ago
edchengg / infoseek_eval
View on GitHub
EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions
☆26May 30, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
alirezasalemi7 / DEDR-MM-FiD
View on GitHub
the code for paper: A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering
☆14Aug 22, 2023Updated 2 years ago
TIGER-AI-Lab / UniIR
View on GitHub
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆183Oct 1, 2024Updated last year
Omaralsaabi / M3DOCRAG
View on GitHub
An implementation of "M3DOCRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding" by Jaemin Cho, Debanj…
☆56Nov 13, 2024Updated last year
prdwb / okvqa-release
View on GitHub
☆15May 10, 2021Updated 5 years ago
1429904852 / KEF
View on GitHub
[COLING 2022] Learning from Adjective-Noun Pairs: A Knowledge-enhanced Framework for Target-Oriented Multimodal Sentiment Classification
☆14Apr 19, 2023Updated 3 years ago
MILVLG / prophet
View on GitHub
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
☆278Jun 14, 2025Updated last year
luomancs / ReMuQ
View on GitHub
a multimodal retrieval dataset
☆25Jul 8, 2023Updated 3 years ago
jingchenchen / ReasoningConsistency-VQA
View on GitHub
☆13Aug 14, 2022Updated 3 years ago
jlian2 / mucko
View on GitHub
Pytorch Implementation of MUCKO(2020 IJCAI)
☆20Oct 25, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
alexandrosXe / A-Simple-Baseline-For-Knowledge-Based-VQA
View on GitHub
Repo for the EMNLP 2023 paper "A Simple Knowledge-Based Visual Question Answering"
☆25Dec 14, 2023Updated 2 years ago
RitaRamo / extra
View on GitHub
Retrieval-augmented Image Captioning
☆13Feb 16, 2023Updated 3 years ago
China-UK-ZSL / ZS-F-VQA
View on GitHub
[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph
☆72Feb 9, 2024Updated 2 years ago
rabiulcste / vqazero
View on GitHub
visual question answering prompting recipes for large vision-language models
☆29Sep 14, 2024Updated last year
sanket0211 / WK-VQA
View on GitHub
World Knowledge Based Visual Question Answering
☆22Nov 26, 2020Updated 5 years ago
val-iisc / RMLVQA
View on GitHub
☆19May 31, 2023Updated 3 years ago
hackerchenzhuo / LaKo
View on GitHub
[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
☆24Feb 9, 2024Updated 2 years ago
yuanqing-ai / LLM-Hierarchical-Consistency
View on GitHub
Official implementation of "Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck" [CVPR'26]
☆16Nov 10, 2025Updated 8 months ago
mayubo2333 / MMLongBench-Doc
View on GitHub
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
☆149Sep 28, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
JerryWisdom / react-vqa-master
View on GitHub
基于 React + router + redux + axios 和 Flask + MySQL + Pytorch 的视觉问答管理系统
☆10Dec 12, 2022Updated 3 years ago
Alibaba-NLP / OmniSearch
View on GitHub
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
☆429Apr 22, 2025Updated last year
aditya10 / VLC-BERT
View on GitHub
Code for WACV 2023 paper "VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge"
☆21May 8, 2023Updated 3 years ago
aioz-ai / CFR_VQA
View on GitHub
Coarse-to-Fine Reasoning for Visual Question Answering (CVPRW'22)
☆49Apr 22, 2026Updated 3 months ago
HITsz-TMG / SKURG
View on GitHub
☆20Nov 4, 2023Updated 2 years ago
GanjinZero / RAMM
View on GitHub
Codes and Pre-trained models for RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [ACM MM 202…
☆29Nov 2, 2023Updated 2 years ago