Alibaba-NLP / VRAGLinks
Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning"
☆382Updated 2 weeks ago
Alternatives and similar repositories for VRAG
Users that are interested in VRAG are comparing it to the libraries listed below
Sorting:
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆387Updated 6 months ago
- Agentic RAG R1 Framework via Reinforcement Learning☆307Updated last month
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆340Updated 2 months ago
- MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding☆237Updated 2 months ago
- Parsing-free RAG supported by VLMs☆832Updated last week
- R1-onevision, a visual language model capable of deep CoT reasoning.☆569Updated 6 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆631Updated 2 weeks ago
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆455Updated this week
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆279Updated this week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆649Updated 2 months ago
- Collect every awesome work about r1!☆422Updated 5 months ago
- ☆367Updated 2 weeks ago
- [EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆588Updated 4 months ago
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆205Updated last month
- A Survey on Multimodal Retrieval-Augmented Generation☆398Updated 2 weeks ago
- Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision☆249Updated 2 weeks ago
- MiroMind Research Agent: Fully Open-Source Deep Research Agent with Reproducible State-of-the-Art Performance on FutureX, GAIA, HLE, Brow…☆794Updated last week
- Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.☆473Updated last month
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆225Updated 4 months ago
- ✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…☆321Updated last month
- The development and future prospects of large multimodal reasoning models.☆526Updated 2 months ago
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆229Updated 5 months ago
- ☆883Updated last week
- PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World☆294Updated 5 months ago
- Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"☆131Updated 5 months ago
- ☆414Updated 2 weeks ago
- ☆1,050Updated last week
- GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents☆347Updated 2 months ago
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆218Updated 4 months ago
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆166Updated 3 weeks ago