aiming-lab / MDocAgent
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
☆127Updated 3 weeks ago
Alternatives and similar repositories for MDocAgent:
Users that are interested in MDocAgent are comparing it to the libraries listed below
- OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆72Updated last month
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆305Updated this week
- Awesome-RAG-VIsion: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆136Updated last week
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆157Updated this week
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆223Updated last week
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆146Updated 10 months ago
- ☆93Updated 2 months ago
- Open replication of DeepSeek R1 for text-to-graph extraction.☆93Updated 2 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆123Updated 5 months ago
- An open platform for enhancing the capability of LLMs in workflow orchestration.☆133Updated last month
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆64Updated 2 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆282Updated 2 weeks ago
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆116Updated 6 months ago
- ☆37Updated last week
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization☆126Updated 3 months ago
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆196Updated 7 months ago
- Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆52Updated 3 weeks ago
- ☆91Updated 2 months ago
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆230Updated 8 months ago
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆477Updated last week
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆151Updated 2 weeks ago
- [WWW 2025] A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System.☆63Updated last week
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆143Updated 7 months ago
- Uses a blend of experimental techniques to enhance LLM RAG resultsets.☆83Updated last month
- [EMNLP 2024] LongRAG: A Dual-perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering☆102Updated 2 months ago
- [ICLR 2025] The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆138Updated last month
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆101Updated last month
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆152Updated last month
- From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation☆88Updated last month
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆452Updated last month