RifleZhang / LLaVA-Hound-DPOLinks

☆155

Alternatives and similar repositories for LLaVA-Hound-DPO

Users that are interested in LLaVA-Hound-DPO are comparing it to the libraries listed below

Sorting:

llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆125Updated 8 months ago
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆111Updated last year
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆51Updated 8 months ago
imagegridworth / IG-VLM
☆140Updated last year
FuxiaoLiu / LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆291Updated last year
egoschema / EgoSchema
☆104Updated 11 months ago
vlf-silkie / VLFeedback
☆100Updated last year
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆98Updated last year
thunlp / Muffin
☆66Updated last year
AoiDragon / POPE
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆99Updated 3 months ago
zai-org / LVBench
[ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark
☆127Updated 4 months ago
opendatalab / HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆98Updated last year
PKU-YuanGroup / Video-Bench
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
☆135Updated last year
yonseivnl / vlm-rlaif
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
☆76Updated last year
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆277Updated last year
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
TideDra / VL-RLHF
A RLHF Infrastructure for Vision-Language Models
☆187Updated last year
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆96Updated last year
RUCAIBox / POPE
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆231Updated 3 months ago
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆170Updated last month
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆164Updated last year
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆200Updated last year
OpenGVLab / MMT-Bench
[ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
☆115Updated last year
baaivision / CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆212Updated last year
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated last year
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆59Updated last year
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆147Updated last year
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
☆234Updated 8 months ago