aiming-lab / MDocAgent
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
☆145Updated last month
Alternatives and similar repositories for MDocAgent
Users that are interested in MDocAgent are comparing it to the libraries listed below
Sorting:
- OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆74Updated last month
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆68Updated 2 months ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆166Updated 2 weeks ago
- Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆151Updated 2 weeks ago
- StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization☆131Updated 4 months ago
- An open platform for enhancing the capability of LLMs in workflow orchestration.☆142Updated 2 months ago
- 🌐 WebWalker: Benchmarking LLMs in Web Traversal☆396Updated 2 weeks ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆319Updated 3 weeks ago
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆224Updated last month
- recursive rag with r1 reasoning☆294Updated 2 months ago
- [EMNLP 2024] LongRAG: A Dual-perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering☆103Updated 3 months ago
- Agentic RAG R1 Framework via Reinforcement Learning☆148Updated last week
- [ICLR 2025] The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆142Updated last month
- GUI Grounding for Professional High-Resolution Computer Use☆200Updated 2 months ago
- Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆54Updated last month
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆107Updated last month
- The evaluation benchmark on MCP servers☆106Updated 2 weeks ago
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆201Updated last month
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆363Updated last month
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆125Updated 6 months ago
- ☆94Updated 5 months ago
- ☆93Updated 2 months ago
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆144Updated 8 months ago
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆472Updated last month
- Official code for Dynamic Parametric RAG.☆112Updated last week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆521Updated 3 weeks ago
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆152Updated this week
- ☆145Updated 3 months ago
- Search, organize, discover anything!☆48Updated last year
- This is the reading list for the survey "A Survey on the Optimization of LLM-based Agents ". We will keep adding papers and improving the…☆93Updated last month