Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training.
☆92Nov 15, 2024Updated last year
Alternatives and similar repositories for RagVL
Users that are interested in RagVL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆37Jul 22, 2025Updated 9 months ago
- The official repository of MM-R5☆29Jun 22, 2025Updated 10 months ago
- [EMNLP 2024] SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information☆12Oct 11, 2024Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆22Jan 11, 2026Updated 3 months ago
- ☆71Oct 27, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"☆26Mar 30, 2026Updated 3 weeks ago
- Parsing-free RAG supported by VLMs☆947Dec 7, 2025Updated 4 months ago
- [ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"☆45Feb 27, 2026Updated 2 months ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification☆50Mar 24, 2025Updated last year
- (ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning☆19Nov 22, 2025Updated 5 months ago
- ☆10Dec 16, 2023Updated 2 years ago
- ☆15Jul 8, 2024Updated last year
- RuleRAG: Rule Meets Retrieval-Augmented Generation for Question Answering☆32Apr 23, 2026Updated last week
- ☆59Feb 27, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆43Jun 7, 2025Updated 10 months ago
- [EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.☆82Jan 19, 2026Updated 3 months ago
- [ICLR 2025] Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models☆61Jan 22, 2025Updated last year
- [EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"☆15Aug 26, 2025Updated 8 months ago
- [ACM MM2025] The official repository for the RealSyn dataset☆40Dec 14, 2025Updated 4 months ago
- ☆35Oct 9, 2025Updated 6 months ago
- Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM | EMNLP 2025 Findings☆18Oct 17, 2025Updated 6 months ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆25May 30, 2024Updated last year
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]☆635Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- E5-V: Universal Embeddings with Multimodal Large Language Models☆275Dec 10, 2025Updated 4 months ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆108May 30, 2025Updated 10 months ago
- ☆13Apr 23, 2025Updated last year
- [MM '25] This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆44Sep 27, 2025Updated 7 months ago
- Retrieval-augmented Image Captioning☆13Feb 16, 2023Updated 3 years ago
- LLM2CLIP significantly improves already state-of-the-art CLIP models.☆651Feb 1, 2026Updated 2 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]☆76Jan 13, 2025Updated last year
- ☆63Jan 3, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆422Apr 22, 2025Updated last year
- A Survey on Medical Report Generation: From Deep Neural Networks to Large Language Models☆31Feb 23, 2024Updated 2 years ago
- ☆17Feb 20, 2025Updated last year
- Code and Data for "FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation" (ACL25)☆32Oct 26, 2025Updated 6 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆49Aug 21, 2024Updated last year
- [AAAI'25] SPRING: Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models☆26Sep 24, 2025Updated 7 months ago
- Official Implementation of CODE☆16Sep 26, 2024Updated last year