Parsing-free RAG supported by VLMs
☆912Dec 7, 2025Updated 2 months ago
Alternatives and similar repositories for VisRAG
Users that are interested in VisRAG are comparing it to the libraries listed below
Sorting:
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,523Feb 19, 2026Updated last week
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆91Nov 15, 2024Updated last year
- [EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆630Jan 11, 2026Updated last month
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆130Nov 6, 2024Updated last year
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation☆310Oct 18, 2024Updated last year
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆413Apr 22, 2025Updated 10 months ago
- This is the code repo for the paper "RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards".☆24Oct 28, 2024Updated last year
- Empowering RAG with a memory-based data interface for all-purpose applications!☆2,210Sep 11, 2025Updated 5 months ago
- ⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)☆3,340Nov 26, 2025Updated 3 months ago
- Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine☆495Jul 23, 2025Updated 7 months ago
- A simple, easy-to-hack GraphRAG implementation☆3,686Jan 27, 2026Updated last month
- ☆58Oct 18, 2024Updated last year
- Multimodal Retrieval-augmented Generation Framework Built by Tongyi Lab, Alibaba Group.☆460Feb 17, 2026Updated last week
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,370May 30, 2025Updated 9 months ago
- 🔍 Search-o1: Agentic Search-Enhanced Large Reasoning Models [EMNLP 2025]☆1,172Nov 17, 2025Updated 3 months ago
- Retrieval and Retrieval-augmented LLMs☆11,329Dec 15, 2025Updated 2 months ago
- Solve Visual Understanding with Reinforced VLMs☆5,845Oct 21, 2025Updated 4 months ago
- Implementation and evaluation of multimodal RAG with text and image inputs for industrial applications☆68Nov 6, 2024Updated last year
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆2,326May 25, 2024Updated last year
- [NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge a…☆3,237Sep 4, 2025Updated 5 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,084Feb 10, 2025Updated last year
- KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning a…☆8,574Jan 28, 2026Updated last month
- A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines☆5,230Feb 20, 2026Updated last week
- [CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents☆59May 26, 2025Updated 9 months ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,402Jan 3, 2025Updated last year
- ✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,490Mar 28, 2025Updated 11 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆398Aug 26, 2025Updated 6 months ago
- Code for our paper: "Building A Coding Assistant via Retrieval-Augmented Language Models"☆10Nov 2, 2024Updated last year
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆9,817Sep 22, 2025Updated 5 months ago
- A family of lightweight multimodal models.☆1,052Nov 18, 2024Updated last year
- Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".☆70Jul 24, 2024Updated last year
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆24,478Aug 12, 2024Updated last year
- [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"☆28,682Updated this week
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆3,845Updated this week
- Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL☆4,041Nov 13, 2025Updated 3 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆845Jan 28, 2025Updated last year
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆23,942Updated this week
- Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆18,386Jan 30, 2026Updated last month
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆31,031Feb 20, 2026Updated last week