nttmdlab-nlp/VDocRAG

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nttmdlab-nlp/VDocRAG)

nttmdlab-nlp / VDocRAG

[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents

☆66

Alternatives and similar repositories for VDocRAG

Users that are interested in VDocRAG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Mungeryang / colqwen3
View on GitHub
The code used to train and run inference with the ColQwen3 model. Welcome to follow and star! ⭐️⭐️⭐️ https://huggingface.co/goodman2001/…
☆15Jul 4, 2026Updated 3 weeks ago
bloomberg / m3docrag
View on GitHub
☆71May 19, 2025Updated last year
MananSuri27 / VisDoM
View on GitHub
☆45Jul 28, 2025Updated last year
WxxShirley / MoLoRAG
View on GitHub
[EMNLP 2025] Official implementation for paper "MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval"
☆27Mar 17, 2026Updated 4 months ago
OpenBMB / VisRAG
View on GitHub
Parsing-free RAG supported by VLMs
☆975Jul 17, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Forlorin / DMAP
View on GitHub
☆15Jan 18, 2026Updated 6 months ago
aiming-lab / MDocAgent
View on GitHub
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
☆353Aug 8, 2025Updated 11 months ago
XLearning-SCU / 2026-CVPR-BML
View on GitHub
[CVPR 2026] Pytorch Code for the paper "Bootstrapping Multi-view Learning for Test-time Noisy Correspondence"
☆15Jul 1, 2026Updated 3 weeks ago
ag2ai / SimpleDoc
View on GitHub
☆41Jan 9, 2026Updated 6 months ago
vec-ai / wikiHow-TIIR
View on GitHub
[ACL 2025] Towards Text-Image Interleaved Retrieval
☆16Sep 3, 2025Updated 10 months ago
yuhui-zh15 / AutoConverter
View on GitHub
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆40May 26, 2025Updated last year
JarvisUSTC / Awesome-Multimodal-RAG
View on GitHub
A curated list of the latest advancements, papers, tools, and datasets for **Multimodal Retrieval-Augmented Generation (RAG)**. Multimoda…
☆53Nov 25, 2025Updated 8 months ago
ocean-luna / HMRAG
View on GitHub
[ACM MM2025] Official code of " HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation"
☆111Jul 23, 2025Updated last year
Gzy1112 / MHier-RAG
View on GitHub
☆37Apr 1, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
yh-hust / DocSeeker
View on GitHub
[CVPR 2026 Highlight] DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
☆18Jun 4, 2026Updated last month
wgcyeo / UniversalRAG
View on GitHub
[ACL 2026 Oral] UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
☆174Jun 24, 2026Updated last month
mayubo2333 / MMLongBench-Doc
View on GitHub
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
☆150Sep 28, 2025Updated 10 months ago
mzsun01 / MM-LDM
View on GitHub
☆11Apr 12, 2024Updated 2 years ago
R2MED / R2MED
View on GitHub
A Benchmark for Reasoning-Driven Retrieval in Medicine
☆19Apr 12, 2026Updated 3 months ago
sdsxdxl / DecEx-RAG
View on GitHub
Accepted by EMNLP 2025 Industry Track
☆21Oct 6, 2025Updated 9 months ago
aimagelab / ReT
View on GitHub
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
☆37Sep 12, 2025Updated 10 months ago
paperClub-hub / chinese_clip
View on GitHub
中文CLIP：自定义数据集，可根据文图提取向量，实现文图匹配。
☆21Sep 14, 2022Updated 3 years ago
MiliLab / REX-RAG
View on GitHub
Official repo for "REX-RAG: Reasoning Exploration with Policy Correction in Retrieval-Augmented Generation"
☆35Sep 28, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SalesforceAIResearch / UniDoc-Bench
View on GitHub
☆38Jun 2, 2026Updated last month
facebookresearch / MetaEmbed
View on GitHub
[ICLR 2026 Oral] Official Implementation of the paper "MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interactio…
☆18Jul 2, 2026Updated 3 weeks ago
Alibaba-NLP / ViDoRAG
View on GitHub
[EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
☆669Jan 11, 2026Updated 6 months ago
KU-HIAI / Ko-Gemma
View on GitHub
☆34Feb 27, 2024Updated 2 years ago
illuin-tech / colpali
View on GitHub
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
☆2,710Jul 13, 2026Updated 2 weeks ago
EvolvingLMMs-Lab / multimodal-search-r1
View on GitHub
[ACL-2026] MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal…
☆470Apr 7, 2026Updated 3 months ago
DataArcTech / RagVL
View on GitHub
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …
☆92Nov 15, 2024Updated last year
ritaranx / AceSearcher
View on GitHub
This is the code repo for the paper AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play (NeurIPS 2025 Spotl…
☆25Sep 29, 2025Updated 10 months ago
LARS-research / TREFE
View on GitHub
Searching a High Performance Feature Extractor for Text Recognition Network. TPAMI 2022
☆13Nov 25, 2022Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
yeezhu / UNIT
View on GitHub
PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.
☆34Sep 26, 2024Updated last year
yujunhuics / Reyes
View on GitHub
2025.01：从零到一实现了一个多模态大模型，并命名为Reyes（睿视），R：睿，eyes：眼。Reyes的参数量为8B，视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct，Reyes也通过一个两…
☆34Feb 10, 2026Updated 5 months ago
omron-sinicx / scipostlayout
View on GitHub
☆25Jul 31, 2024Updated last year
ihdia / seamformer
View on GitHub
Official repository accompaying the ICDAR 2023 paper
☆14Oct 3, 2023Updated 2 years ago
EIT-NLP / Layer_Select_Fuse_for_MLLM
View on GitHub
[CVPR2025] Official implementation of the paper "Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practi…
☆48Oct 29, 2025Updated 9 months ago
MrZilinXiao / AutoVER
View on GitHub
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
☆14Mar 2, 2024Updated 2 years ago
zhengxuJosh / Awesome-RAG-Vision
View on GitHub
Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision
☆339Jan 25, 2026Updated 6 months ago