LikeGiver / VideoRAG
a tiny project to test the effectiveness of video QA through RAG techniques and multimodal LLMs
☆14Updated 10 months ago
Alternatives and similar repositories for VideoRAG:
Users that are interested in VideoRAG are comparing it to the libraries listed below
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 6 months ago
- ☆81Updated 10 months ago
- ☆41Updated 5 months ago
- ☆35Updated last month
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆50Updated last week
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆82Updated 2 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆118Updated 4 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆65Updated 4 months ago
- ☆31Updated 2 months ago
- ☆47Updated last month
- Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆49Updated 2 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆285Updated 2 weeks ago
- [CVPR2025] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆177Updated 2 weeks ago
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆34Updated 5 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆98Updated last month
- Search, organize, discover anything!☆48Updated 11 months ago
- MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding☆71Updated last week
- ☆73Updated last year
- ☆28Updated 6 months ago
- ☆84Updated last month
- This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"☆165Updated last month
- ☆19Updated 2 weeks ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 5 months ago
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆148Updated 2 weeks ago
- [preprint] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆43Updated 3 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆113Updated 4 months ago
- A Simple Framework of Small-scale Large Multimodal Models for Video Understanding Based on TinyLLaVA_Factory.☆46Updated 2 weeks ago
- Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆118Updated last week
- ☆16Updated 4 months ago
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation☆10Updated 5 months ago