Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training.
☆91Nov 15, 2024Updated last year
Alternatives and similar repositories for RagVL
Users that are interested in RagVL are comparing it to the libraries listed below
Sorting:
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆20Jan 11, 2026Updated last month
- The official repository of MM-R5☆28Jun 22, 2025Updated 8 months ago
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆36Jul 22, 2025Updated 7 months ago
- ☆58Feb 27, 2025Updated last year
- ☆68Oct 27, 2023Updated 2 years ago
- ☆34Oct 9, 2025Updated 4 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆44Sep 24, 2024Updated last year
- [EMNLP 2024] SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information☆12Oct 11, 2024Updated last year
- Parsing-free RAG supported by VLMs☆912Dec 7, 2025Updated 2 months ago
- [ICLR 2025] Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models☆59Jan 22, 2025Updated last year
- [ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.☆14Mar 2, 2024Updated last year
- (ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning☆19Nov 22, 2025Updated 3 months ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆17Oct 17, 2025Updated 4 months ago
- [EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"☆20Sep 12, 2025Updated 5 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆44Sep 27, 2025Updated 5 months ago
- [ACM MM2025] The official repository for the RealSyn dataset☆40Dec 14, 2025Updated 2 months ago
- RuleRAG: Rule Meets Retrieval-Augmented Generation for Question Answering☆32Oct 8, 2025Updated 4 months ago
- [AAAI'25] SPRING: Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models☆25Sep 24, 2025Updated 5 months ago
- ☆67Aug 14, 2025Updated 6 months ago
- ☆10Dec 16, 2023Updated 2 years ago
- [EMNLP 2025] The official implementation of "Zero-shot Multimodal Document Retrieval via Cross-Modal Question Generation"☆15Aug 26, 2025Updated 6 months ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification☆49Mar 24, 2025Updated 11 months ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆47Aug 21, 2024Updated last year
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆46Jul 17, 2025Updated 7 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆274Dec 10, 2025Updated 2 months ago
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆43Jun 7, 2025Updated 8 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]☆579Feb 11, 2026Updated 2 weeks ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆104May 30, 2025Updated 8 months ago
- [CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering☆54Jul 14, 2025Updated 7 months ago
- ChineseCLIP using online learning☆13Nov 7, 2022Updated 3 years ago
- ☆14Dec 18, 2024Updated last year
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 3 months ago
- ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025☆13Aug 25, 2025Updated 6 months ago
- A PyTorch implementation of Proxy Anchor Loss based on CVPR 2020 paper "Proxy Anchor Loss for Deep Metric Learning"☆11Jan 16, 2021Updated 5 years ago
- [ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]☆15Jul 15, 2025Updated 7 months ago
- For the rlhf learning environment of Koreans☆25Sep 25, 2023Updated 2 years ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆25May 30, 2024Updated last year
- This is the official repository for Retrieval Augmented Visual Question Answering☆244Dec 19, 2024Updated last year