zhengxuJosh / Awesome-RAG-VisionLinks
Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision
☆297Updated last week
Alternatives and similar repositories for Awesome-RAG-Vision
Users that are interested in Awesome-RAG-Vision are comparing it to the libraries listed below
Sorting:
- The development and future prospects of large multimodal reasoning models.☆572Updated last week
- Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforce…☆425Updated 3 months ago
- Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and Dee…☆61Updated 9 months ago
- MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding☆280Updated 5 months ago
- ✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…☆382Updated this week
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆297Updated 3 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆380Updated 4 months ago
- Customize your arXiv recommendation every day.☆139Updated 3 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆573Updated 9 months ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆65Updated 8 months ago
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆87Updated 7 months ago
- Collection of papers and repos for multimodal chain-of-thought☆89Updated last year
- Collect every awesome work about r1!☆426Updated 8 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆128Updated last year
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆229Updated last week
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆407Updated 8 months ago
- This repository collects papers on VLLM applications. We will update new papers irregularly.☆198Updated 3 weeks ago
- A Survey on Multimodal Retrieval-Augmented Generation☆459Updated last week
- Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, and external …☆52Updated last year
- ☆33Updated last year
- [EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆619Updated this week
- Train a Language Model with GRPO to create a schedule from a list of events and priorities☆255Updated 8 months ago
- ☆1,069Updated last month
- 《多模态大模型:新一代人工智能技术范式》作者:刘阳,林倞☆257Updated last year
- The paper list of "Memory in the Age of AI Agents: A Survey"☆767Updated last week
- Collections of Papers and Projects for Multimodal Reasoning.☆107Updated 8 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆232Updated 2 months ago
- 📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.☆381Updated last week
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆933Updated 2 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆268Updated last month