An implementation of "M3DOCRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding" by Jaemin Cho, Debanjan Mahata, Ozan Irsoy, Yujie He, and Mohit Bansal (UNC Chapel Hill & Bloomberg).
☆52Nov 13, 2024Updated last year
Alternatives and similar repositories for M3DOCRAG
Users that are interested in M3DOCRAG are comparing it to the libraries listed below
Sorting:
- ☆65May 19, 2025Updated 10 months ago
- (ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning☆28Sep 27, 2024Updated last year
- [ACL 2024 Oral] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Mo…☆39Jun 30, 2024Updated last year
- KDD 2024 AQA competition 2nd place solution☆12Jul 21, 2024Updated last year
- MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding☆311Aug 8, 2025Updated 7 months ago
- [NeurIPS 2024] Official Code for the Paper "Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning"☆28Apr 8, 2025Updated 11 months ago
- ☆33Dec 29, 2024Updated last year
- A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).☆41May 22, 2025Updated 10 months ago
- ☆11Jan 19, 2025Updated last year
- [SIGIR 2025] Official impl. of "MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal Generation…☆18Apr 15, 2025Updated 11 months ago
- ☆10Nov 12, 2024Updated last year
- [ICML 2025] Improving Planning of Agents for Long-Horizon Tasks☆26Oct 2, 2025Updated 5 months ago
- [EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆645Jan 11, 2026Updated 2 months ago
- Diverse Demonstrations Improve In-context Compositional Generalization☆12Jul 7, 2023Updated 2 years ago
- ☆15Jan 23, 2025Updated last year
- ☆18Mar 31, 2024Updated last year
- Ranking-Consistent Language-Image Pretraining☆12Oct 24, 2025Updated 4 months ago
- A code☆29Jan 23, 2025Updated last year
- ☆18Jun 10, 2025Updated 9 months ago
- ☆12Feb 24, 2023Updated 3 years ago
- B站视频信息爬虫☆12Apr 19, 2018Updated 7 years ago
- ☆15Dec 14, 2024Updated last year
- CSU签到、临时离开、签离助手☆12Aug 27, 2022Updated 3 years ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,563Updated this week
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- ☆10Feb 9, 2024Updated 2 years ago
- [IJCV] Progressive Visual Prompt Learning with Contrastive Feature Re-formation☆15Aug 10, 2024Updated last year
- UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities☆161May 21, 2025Updated 10 months ago
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆63Feb 24, 2025Updated last year
- Instituto de Telecomunicações Deep Learning-based Point Cloud Codec☆11Jun 18, 2024Updated last year
- This repository contains the code of metric indexing for exact similarity search.☆12Jul 11, 2023Updated 2 years ago
- A list of multi-vector retrieval resources☆18May 29, 2024Updated last year
- Tool to train/test models on 3d point cloud segmentation☆10Jun 14, 2025Updated 9 months ago
- ☆32Apr 8, 2025Updated 11 months ago
- ☆16May 26, 2025Updated 9 months ago
- Parsing-free RAG supported by VLMs☆935Dec 7, 2025Updated 3 months ago
- Optocal Character Recognition (OCR / HTR) using Transformers☆11Aug 20, 2022Updated 3 years ago
- Using image captions with LLM for zero-shot VQA☆18Mar 14, 2024Updated 2 years ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆22Nov 8, 2023Updated 2 years ago