IDEA-FinAI / RagVLLinks

Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training.

☆78

Alternatives and similar repositories for RagVL

Users that are interested in RagVL are comparing it to the libraries listed below

Sorting:

Liuziyu77 / MMDU
Official repository of MMDU dataset
☆92Updated 8 months ago
RUCAIBox / Virgo
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆104Updated 3 weeks ago
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆79Updated 4 months ago
LinWeizheDragon / FLMR
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
☆91Updated 3 weeks ago
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆168Updated 8 months ago
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆56Updated 9 months ago
RifleZhang / LLaVA-Reasoner-DPO
☆78Updated 5 months ago
Go2Heart / EchoSight
[EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.
☆63Updated last week
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆117Updated 7 months ago
TIGER-AI-Lab / UniIR
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆152Updated 8 months ago
Liuziyu77 / RAR
The official implementation of RAR
☆88Updated last year
swordlidev / Evaluation-Multimodal-LLMs-Survey
A Survey on Benchmarks of Multimodal Large Language Models
☆111Updated 3 months ago
UCSC-VLAA / VLAA-Thinking
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆123Updated 2 months ago
JUNJIE99 / VISTA_Evaluation_FineTuning
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…
☆38Updated 7 months ago
NUS-TRAIL / NoisyRollout
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆69Updated 3 weeks ago
FreedomIntelligence / MLLM-Bench
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
☆69Updated 8 months ago
FuxiaoLiu / MMC
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
☆96Updated 5 months ago
XMUDeepLIT / LLaVE
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
☆60Updated last month
zjunlp / Deco
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
☆86Updated 6 months ago
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆80Updated 5 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆58Updated 5 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆65Updated 11 months ago
bzluan / TextCoT
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
☆40Updated 9 months ago
HZQ950419 / Math-LLaVA
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆84Updated 11 months ago
zhourax / VEGA
☆37Updated 11 months ago
LALBJ / PAI
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
☆123Updated 7 months ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆62Updated 2 months ago
chancharikmitra / CCoT
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆131Updated last year
thunlp / Muffin
☆64Updated last year
jinbo0906 / Awesome-MLLM-Datasets
This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …
☆48Updated last month