zwq2018 / Multi-modal-Self-instructLinks

The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

☆82

Alternatives and similar repositories for Multi-modal-Self-instruct

Users that are interested in Multi-modal-Self-instruct are comparing it to the libraries listed below

Sorting:

yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆55Updated 8 months ago
YuxiXie / V-DPO
Preference Learning for LLaVA
☆47Updated 8 months ago
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆92Updated 10 months ago
vlf-silkie / VLFeedback
☆100Updated last year
mlfoundations / VisIT-Bench
☆50Updated last year
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆77Updated last year
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆116Updated 8 months ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆62Updated 3 months ago
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆54Updated 7 months ago
findalexli / mllm-dpo
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆46Updated 8 months ago
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆79Updated 8 months ago
RifleZhang / LLaVA-Reasoner-DPO
☆85Updated 6 months ago
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆69Updated last year
DAMO-NLP-SG / multimodal_textbook
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆167Updated 4 months ago
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆85Updated 10 months ago
FudanDISC / ReForm-Eval
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
☆46Updated last year
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆48Updated 4 months ago
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆67Updated 2 months ago
thunlp / Muffin
☆66Updated last year
NUS-TRAIL / NoisyRollout
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆83Updated last month
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆57Updated 10 months ago
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆57Updated 11 months ago
patrick-tssn / VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆36Updated 4 months ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
☆131Updated last month
opendatalab / HA-DPO
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆90Updated last year
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated last month
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆80Updated 6 months ago
core-mm / core-mm
☆17Updated last year
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆144Updated 8 months ago
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆47Updated last year