ParadoxZW / LLaVA-UHD-BetterLinks

A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo

☆34

Alternatives and similar repositories for LLaVA-UHD-Better

Users that are interested in LLaVA-UHD-Better are comparing it to the libraries listed below

Sorting:

Liuziyu77 / MMDU
Official repository of MMDU dataset
☆98Updated last year
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆60Updated last year
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆147Updated last year
vlf-silkie / VLFeedback
☆100Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆84Updated 10 months ago
RifleZhang / LLaVA-Hound-DPO
☆155Updated last year
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆51Updated 8 months ago
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆58Updated last year
pengts / VW-LMM
☆25Updated last year
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆277Updated last year
OpenGVLab / MMT-Bench
[ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
☆115Updated last year
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆170Updated last month
thunlp / Muffin
☆66Updated last year
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆117Updated last year
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆200Updated last year
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
HZQ950419 / Math-LLaVA
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆92Updated last year
MAmmoTH-VL / MAmmoTH-VL
(ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
☆48Updated 6 months ago
opendatalab / HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆98Updated last year
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆83Updated last year
foundation-multimodal-models / CAPTURE
☆81Updated last year
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆125Updated 8 months ago
FreedomIntelligence / MLLM-Bench
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
☆72Updated last year
patrick-tssn / VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆38Updated last month
TideDra / VL-RLHF
A RLHF Infrastructure for Vision-Language Models
☆187Updated last year
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆58Updated 11 months ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
XMUDeepLIT / LLaVE
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
☆73Updated 6 months ago