aiming-lab/MDocAgent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/aiming-lab/MDocAgent)

aiming-lab / MDocAgent

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

☆353

Alternatives and similar repositories for MDocAgent

Users that are interested in MDocAgent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nttmdlab-nlp / VDocRAG
View on GitHub
[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents
☆66May 26, 2025Updated last year
Forlorin / DMAP
View on GitHub
☆15Jan 18, 2026Updated 6 months ago
bloomberg / m3docrag
View on GitHub
☆71May 19, 2025Updated last year
MananSuri27 / VisDoM
View on GitHub
☆45Jul 28, 2025Updated last year
aiming-lab / EduVisAgent
View on GitHub
[ICLR'26] EduVisAgent: A Benchmark and Multi-Agent Framework for Pedagogical Visualization
☆30Aug 5, 2025Updated 11 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ag2ai / SimpleDoc
View on GitHub
☆41Jan 9, 2026Updated 6 months ago
Aeryn666 / RegionRAG
View on GitHub
[AAAI2026] Source code for RegionRAG
☆24Apr 20, 2026Updated 3 months ago
OpenBMB / VisRAG
View on GitHub
Parsing-free RAG supported by VLMs
☆975Jul 17, 2026Updated last week
Gzy1112 / MHier-RAG
View on GitHub
☆37Apr 1, 2026Updated 3 months ago
dengc2023 / LongDocURL
View on GitHub
☆42Apr 6, 2026Updated 3 months ago
Lillianwei-h / MMIE
View on GitHub
[ICLR'25 Oral] MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
☆35Nov 3, 2024Updated last year
mayubo2333 / MMLongBench-Doc
View on GitHub
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
☆150Sep 28, 2025Updated 10 months ago
aiming-lab / MMedPO
View on GitHub
[ICML'25] MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
☆74Jun 5, 2025Updated last year
ocean-luna / HMRAG
View on GitHub
[ACM MM2025] Official code of " HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation"
☆111Jul 23, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
DataArcTech / RagVL
View on GitHub
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …
☆92Nov 15, 2024Updated last year
llm-lab-org / Multimodal-RAG-Survey
View on GitHub
A Survey on Multimodal Retrieval-Augmented Generation
☆537Feb 20, 2026Updated 5 months ago
aiming-lab / ClawArena
View on GitHub
☆58Jul 1, 2026Updated 3 weeks ago
microsoft / PixelCraft
View on GitHub
[ICLR 2026] High-Fidelity Visual Reasoning on Structured Images
☆30Jul 17, 2026Updated last week
Alibaba-NLP / VRAG
View on GitHub
Multimodal Retrieval-augmented Generation Framework Built by Tongyi Lab, Alibaba Group.
☆970Apr 29, 2026Updated 3 months ago
deepglint / UniDoc-RL
View on GitHub
UniDoc-RL: Unified Document Understanding with Reinforcement Learning
☆16May 21, 2026Updated 2 months ago
yuhui-zh15 / AutoConverter
View on GitHub
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆40May 26, 2025Updated last year
Omaralsaabi / M3DOCRAG
View on GitHub
An implementation of "M3DOCRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding" by Jaemin Cho, Debanj…
☆56Nov 13, 2024Updated last year
puar-playground / Self-Visual-RAG
View on GitHub
Implementation of MLLM-based Self-Vision-RAG models
☆15Nov 30, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
PKU-AICare / ConfAgents
View on GitHub
ConfAgents: A Conformal-Guided Multi-Agent Framework for Cost-Efficient Medical Diagnosis
☆15Jul 22, 2026Updated last week
aiming-lab / Agent0
View on GitHub
[COLM'26 & ICML'26] Agent0 Series: Self-Evolving Agents from Zero Data
☆1,236Jul 10, 2026Updated 2 weeks ago
richard-peng-xia / MMed-RAG
View on GitHub
[ICLR'25] MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
☆337Jan 22, 2025Updated last year
aiming-lab / SkillRL
View on GitHub
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
☆909May 17, 2026Updated 2 months ago
Alibaba-NLP / OmniSearch
View on GitHub
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
☆429Apr 22, 2025Updated last year
RAG-Gym / RAG-Gym
View on GitHub
Official repository for RAG-Gym
☆126Jul 14, 2026Updated 2 weeks ago
SPORT-Agents / SPORT-Agents
View on GitHub
☆22Dec 18, 2025Updated 7 months ago
Leon1207 / Video-RAG-master
View on GitHub
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…
☆448Jun 26, 2026Updated last month
VisualWebBench / VisualWebBench
View on GitHub
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆68Oct 19, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
haon-chen / mmE5
View on GitHub
☆59Feb 27, 2025Updated last year
zhaosuifeng / FinRAGBench-V
View on GitHub
FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain (EMNLP 2025)
☆20Jan 13, 2026Updated 6 months ago
RUC-NLPIR / Search-o1
View on GitHub
🔍 Search-o1: Agentic Search-Enhanced Large Reasoning Models [EMNLP 2025]
☆1,240Nov 17, 2025Updated 8 months ago
Da1yuqin / EviNoteRAG
View on GitHub
Welcome! 😊 This is the official code release of EviNote-RAG, and we’re happy to share it with the community.
☆48Jun 4, 2026Updated last month
TacoSkill / Skillbolt
View on GitHub
A universal skill ecosystem for AI agents.
☆25Feb 6, 2026Updated 5 months ago
opendatalab / DocLayout-YOLO
View on GitHub
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
☆2,236Apr 14, 2025Updated last year
illuin-tech / colpali
View on GitHub
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
☆2,710Jul 13, 2026Updated 2 weeks ago