llm-lab-org / Multimodal-RAG-Survey
View external linksLinks

A Survey on Multimodal Retrieval-Augmented Generation

☆474

Alternatives and similar repositories for Multimodal-RAG-Survey

Users that are interested in Multimodal-RAG-Survey are comparing it to the libraries listed below

Sorting:

MananSuri27 / VisDoM
View on GitHub
☆39Jul 28, 2025Updated 6 months ago
Alibaba-NLP / VRAG
View on GitHub
Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforce…
☆441Jan 13, 2026Updated last month
mragbench / MRAG-Bench
View on GitHub
[ICLR 2025] Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
☆59Jan 22, 2025Updated last year
Alibaba-NLP / ViDoRAG
View on GitHub
[EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
☆626Jan 11, 2026Updated last month
DataArcTech / RagVL
View on GitHub
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …
☆91Nov 15, 2024Updated last year
JerrryNie / ConceptCLIP
View on GitHub
☆22May 12, 2025Updated 9 months ago
EvolvingLMMs-Lab / multimodal-search-r1
View on GitHub
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…
☆392Aug 26, 2025Updated 5 months ago
PanguIR / MRAGSurvey
View on GitHub
A Survey of Multimodal Retrieval-Augmented Generation
☆20Nov 3, 2025Updated 3 months ago
hellangleZ / Qwen3_autothink_adapter
View on GitHub
Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…
☆22May 9, 2025Updated 9 months ago
Alibaba-NLP / OmniSearch
View on GitHub
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
☆412Apr 22, 2025Updated 9 months ago
OpenSenseNova / SenseNova-MARS
View on GitHub
☆82Feb 5, 2026Updated last week
OpenBMB / VisRAG
View on GitHub
Parsing-free RAG supported by VLMs
☆910Dec 7, 2025Updated 2 months ago
yejinc00 / PREMIR
View on GitHub
[EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"
☆15Aug 26, 2025Updated 5 months ago
kleinercubs / ImgFact
View on GitHub
Beyond Entities: A Large-Scale Multi-Modal Knowledge Graph with Triplet Fact Grounding
☆11May 23, 2024Updated last year
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,337Feb 7, 2026Updated last week
Osilly / Vision-R1
View on GitHub
[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that…
☆760Jan 26, 2026Updated 2 weeks ago
hymie122 / RAG-Survey
View on GitHub
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-…
☆1,789Aug 20, 2024Updated last year
mayubo2333 / MMLongBench-Doc
View on GitHub
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
☆121Sep 28, 2025Updated 4 months ago
OoDBag / VisTA
View on GitHub
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
☆22May 31, 2025Updated 8 months ago
TerryPei / CSP
View on GitHub
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
☆10Dec 15, 2024Updated last year
OpenCausaLab / MORE
View on GitHub
☆15Jan 9, 2026Updated last month
zjukg / KG-MM-Survey
View on GitHub
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
☆474Dec 10, 2024Updated last year
Omaralsaabi / M3DOCRAG
View on GitHub
An implementation of "M3DOCRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding" by Jaemin Cho, Debanj…
☆48Nov 13, 2024Updated last year
showlab / Awesome-MLLM-Hallucination
View on GitHub
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
☆979Sep 27, 2025Updated 4 months ago
AsteRiRi / SPPA
View on GitHub
Pytorch implementation of: "Continual Semantic Segmentation via Structure Preserving and Projected Feature Alignment", ECCV22
☆12Jul 22, 2022Updated 3 years ago
scofield7419 / MUIE-REAMO
View on GitHub
Code of the Grounded MUIE model, REAMO
☆11Dec 3, 2024Updated last year
AIoT-MLSys-Lab / MEDA
View on GitHub
[NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
☆17Jun 19, 2025Updated 7 months ago
tmlr-group / ZS-NTTA
View on GitHub
[ICLR 2025] "Noisy Test-Time Adaptation in Vision-Language Models"
☆12Feb 22, 2025Updated 11 months ago
SPIRAL-MED / DiagnosisArena
View on GitHub
☆28Dec 4, 2025Updated 2 months ago
mbzuai-oryx / MIRA
View on GitHub
[ACM MM 2025 🔥🔥 ] MIRA: A first-of-its-kind medical RAG framework that fuses image features and retrieved knowledge with dynamic contex…
☆18Aug 28, 2025Updated 5 months ago
NEUIR / MemGraph
View on GitHub
[SIGIR 2025] This is the code repo for our SIGIR'25 paper: Enhancing the Patent Matching Capability of Large Language Models via Memory G…
☆18Apr 22, 2025Updated 9 months ago
ictnlp / FlexRAG
View on GitHub
FlexRAG: A RAG Framework for Information Retrieval and Generation.
☆233Jan 21, 2026Updated 3 weeks ago
TIGER-AI-Lab / VLM2Vec
View on GitHub
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
☆569Updated this week
Lackel / AGLA
View on GitHub
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆61Jul 16, 2024Updated last year
minglllli / CLIPFit
View on GitHub
[EMNLP 2024] Implementation of vision-language model fine-tuning via simple parameter-efficient modification
☆18Nov 24, 2024Updated last year
GasolSun36 / SURf
View on GitHub
[EMNLP 2024] SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information
☆12Oct 11, 2024Updated last year
chenyiqun / MMOA-RAG
View on GitHub
This is the code of MMOA-RAG.
☆102May 11, 2025Updated 9 months ago
EvolvingLMMs-Lab / open-r1-multimodal
View on GitHub
A fork to add multimodal model training to open-r1
☆1,449Feb 8, 2025Updated last year
DataArcTech / ToG
View on GitHub
This is the official github repo of Think-on-Graph (ICLR 2024). If you are interested in our work or willing to join our research team in…
☆623Mar 24, 2024Updated last year

llm-lab-org / Multimodal-RAG-SurveyView external linksLinks

Alternatives and similar repositories for Multimodal-RAG-Survey

llm-lab-org / Multimodal-RAG-Survey
View external linksLinks