Alibaba-NLP/VRAG

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Alibaba-NLP/VRAG)

Alibaba-NLP / VRAG

Multimodal Retrieval-augmented Generation Framework Built by Tongyi Lab, Alibaba Group.

☆970

Alternatives and similar repositories for VRAG

Users that are interested in VRAG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

EvolvingLMMs-Lab / multimodal-search-r1
View on GitHub
[ACL-2026] MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal…
☆470Apr 7, 2026Updated 3 months ago
Alibaba-NLP / ViDoRAG
View on GitHub
[EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
☆669Jan 11, 2026Updated 6 months ago
PeterGriffinJin / Search-R1
View on GitHub
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
☆5,150Nov 13, 2025Updated 8 months ago
Alibaba-NLP / DeepResearch
View on GitHub
Tongyi Deep Research, the Leading Open-source Deep Research Agent
☆19,714Feb 27, 2026Updated 4 months ago
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,081Updated this week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
OpenBMB / VisRAG
View on GitHub
Parsing-free RAG supported by VLMs
☆972Jul 17, 2026Updated last week
Visual-Agent / DeepEyes
View on GitHub
☆1,250Nov 20, 2025Updated 8 months ago
Tongyi-Zhiwen / Qwen-Doc
View on GitHub
☆548May 25, 2026Updated last month
llm-lab-org / Multimodal-RAG-Survey
View on GitHub
A Survey on Multimodal Retrieval-Augmented Generation
☆533Feb 20, 2026Updated 5 months ago
modelscope / ms-swift
View on GitHub
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…
☆14,937Updated this week
mayubo2333 / MMLongBench-Doc
View on GitHub
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
☆149Sep 28, 2025Updated 9 months ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,649Updated this week
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆422Jan 29, 2026Updated 5 months ago
illuin-tech / colpali
View on GitHub
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
☆2,706Jul 13, 2026Updated last week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Alibaba-NLP / MaskSearch
View on GitHub
Repo for "MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability"
☆155May 27, 2025Updated last year
nttmdlab-nlp / VDocRAG
View on GitHub
[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents
☆66May 26, 2025Updated last year
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,013Jul 7, 2026Updated 2 weeks ago
QwenLM / Qwen3-VL-Embedding
View on GitHub
☆1,338Jun 23, 2026Updated last month
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,662Jan 30, 2026Updated 5 months ago
DataArcTech / RagVL
View on GitHub
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …
☆92Nov 15, 2024Updated last year
Aeryn666 / RegionRAG
View on GitHub
[AAAI2026] Source code for RegionRAG
☆24Apr 20, 2026Updated 3 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,493Mar 9, 2026Updated 4 months ago
XMUDeepLIT / UME-R1
View on GitHub
The code implementation for UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings (ICLR 2026).
☆69Feb 25, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
HKUDS / RAG-Anything
View on GitHub
"RAG-Anything: All-in-One RAG Framework"
☆22,395Updated this week
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,583Jun 14, 2025Updated last year
Alibaba-NLP / OmniSearch
View on GitHub
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
☆429Apr 22, 2025Updated last year
aiming-lab / MDocAgent
View on GitHub
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
☆352Aug 8, 2025Updated 11 months ago
OpenBMB / UltraRAG
View on GitHub
A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines
☆5,659Updated this week
ByteDance-Seed / m3-agent
View on GitHub
☆1,423Feb 12, 2026Updated 5 months ago
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,299Updated this week
shawn0728 / OpenSearch-VL
View on GitHub
🔍 OpenSearch-VL provides a fully open recipe for training strong multimodal deep search agents through high-quality data curation, diver…
☆256May 19, 2026Updated 2 months ago
agents-x-project / PyVision
View on GitHub
[MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."
☆162Jul 22, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
zai-org / GLM-V
View on GitHub
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
☆2,357Updated this week
Agent-RL / ReCall
View on GitHub
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…
☆1,412May 16, 2025Updated last year
StarsfieldAI / R1-V
View on GitHub
Witness the aha moment of VLM with less than $3.
☆4,065May 19, 2025Updated last year
QwenLM / Qwen-Agent
View on GitHub
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
☆16,841Mar 4, 2026Updated 4 months ago
MMBrowseComp / MM-BrowseComp
View on GitHub
☆70Jan 4, 2026Updated 6 months ago
Alibaba-NLP / ZeroSearch
View on GitHub
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
☆1,307Aug 16, 2025Updated 11 months ago
LHRLAB / Graph-R1
View on GitHub
[ICML 2026] Official resources of "Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning".
☆583Apr 30, 2026Updated 2 months ago