zhengxuJosh / Awesome-RAG-Vision
Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision
☆144Updated last week
Alternatives and similar repositories for Awesome-RAG-Vision:
Users that are interested in Awesome-RAG-Vision are comparing it to the libraries listed below
- Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and Dee…☆52Updated last month
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆163Updated last week
- A Survey on Multimodal Retrieval-Augmented Generation☆156Updated 2 weeks ago
- Collect every awesome work about r1!☆356Updated last week
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆310Updated 2 weeks ago
- Awesome LLM pre-training resources, including data, frameworks, and methods.☆136Updated last week
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆462Updated last month
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆232Updated 2 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆123Updated 6 months ago
- Customize your arXiv recommendation every day.☆100Updated last month
- Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆52Updated last month
- 该系列的目的是让读者可以在基础的pytorch上,不依赖任何其他现成的外部库,从零开始理解并实现一个大语言模型的所有组成部分,以及训练微调代码,因此读者仅需python,pytorch和最基础深度学习背景知识即可。☆249Updated 2 months ago
- ☆173Updated 3 months ago
- Agentic RAG R1 Framework via Reinforcement Learning☆130Updated this week
- ☆53Updated 2 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆513Updated 3 weeks ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆43Updated this week
- MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding☆137Updated last month
- Collection of papers and repos for multimodal chain-of-thought☆81Updated 6 months ago
- Latest Advances on Long Chain-of-Thought Reasoning☆273Updated 3 weeks ago
- 利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data☆159Updated 5 months ago
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆119Updated this week
- The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.☆31Updated 2 weeks ago
- Search, organize, discover anything!☆49Updated last year
- Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-…☆289Updated 10 months ago
- The official GitHub page for the survey paper "A Survey on Data Augmentation in Large Model Era"☆124Updated 9 months ago
- MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval☆157Updated 3 weeks ago
- Chrome / Edge extension to turn arXiv papers into Markdown codes in one click.☆78Updated last month
- ThinkLLM:🚀 轻量、高效的大语言模型算法实现☆45Updated this week
- Qwen GRPO Graph Extraction RL Finetune☆46Updated last month