YuxiXie / V-DPOView external linksLinks
Preference Learning for LLaVA
☆59Nov 9, 2024Updated last year
Alternatives and similar repositories for V-DPO
Users that are interested in V-DPO are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆85Oct 26, 2025Updated 3 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆100Jan 30, 2024Updated 2 years ago
- A RLHF Infrastructure for Vision-Language Models☆196Nov 15, 2024Updated last year
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆85Nov 10, 2024Updated last year
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- [ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination☆19Jan 27, 2025Updated last year
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆57Oct 28, 2024Updated last year
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆133Sep 11, 2025Updated 5 months ago
- This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vi…☆117Jun 18, 2025Updated 7 months ago
- [NeurIPS 2024] WATT: Weight Average Test-Time Adaptation of CLIP☆56Sep 26, 2024Updated last year
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆50Jun 16, 2025Updated 8 months ago
- Code for the paper "RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection" (ACL'25).☆33Jul 23, 2025Updated 6 months ago
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆20Jan 11, 2026Updated last month
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆61Jul 16, 2024Updated last year
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆55Mar 31, 2025Updated 10 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆46Dec 1, 2024Updated last year
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆23Nov 25, 2025Updated 2 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆306Sep 11, 2024Updated last year
- Visual self-questioning for large vision-language assistant.☆45Jul 23, 2025Updated 6 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Jul 10, 2024Updated last year
- KAIST medical VL research group☆20Dec 20, 2024Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 10 months ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- ☆15Sep 11, 2025Updated 5 months ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆20Aug 1, 2025Updated 6 months ago
- [Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.☆14Jun 7, 2023Updated 2 years ago
- Recent Advances on MLLM's Reasoning Ability☆26Apr 11, 2025Updated 10 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆204Jul 17, 2025Updated 6 months ago
- An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation☆153Jan 15, 2024Updated 2 years ago
- Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection☆19Feb 5, 2026Updated last week
- Project for SNARE benchmark☆11Jun 5, 2024Updated last year
- ☆11Oct 2, 2024Updated last year
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆14Sep 30, 2023Updated 2 years ago
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆26Oct 13, 2022Updated 3 years ago
- [EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction☆51Aug 20, 2022Updated 3 years ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆50Oct 23, 2024Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆424Dec 22, 2024Updated last year