This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
☆117Jun 18, 2025Updated 8 months ago
Alternatives and similar repositories for Vision-LLM-Alignment
Users that are interested in Vision-LLM-Alignment are comparing it to the libraries listed below
Sorting:
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation☆28Jun 30, 2025Updated 8 months ago
- A RLHF Infrastructure for Vision-Language Models☆196Nov 15, 2024Updated last year
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆60Aug 23, 2024Updated last year
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆86Nov 10, 2024Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆91Apr 30, 2024Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆20Jan 11, 2026Updated last month
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆52Oct 19, 2024Updated last year
- ☆46Dec 30, 2024Updated last year
- A list of conferences and journals relevant to machine translation☆33Mar 17, 2022Updated 3 years ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆69May 31, 2024Updated last year
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,360Updated this week
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆100Jan 30, 2024Updated 2 years ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆381Feb 23, 2025Updated last year
- Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation, ECCV 2024☆22Feb 15, 2024Updated 2 years ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- A new multi-task learning framework using Vision Transformers☆11Jun 19, 2024Updated last year
- [MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"☆14Nov 1, 2024Updated last year
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆11Aug 11, 2025Updated 6 months ago
- Building a inclusive, scalable, and high-performance multilingual translation model☆121Jan 22, 2026Updated last month
- The project for speech translation☆12Sep 28, 2023Updated 2 years ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆46Apr 29, 2024Updated last year
- MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆216Sep 26, 2025Updated 5 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆623Mar 18, 2025Updated 11 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆307Sep 11, 2024Updated last year
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆310May 21, 2025Updated 9 months ago
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated last month
- Pytorch implementation of: "Continual Semantic Segmentation via Structure Preserving and Projected Feature Alignment", ECCV22☆11Jul 22, 2022Updated 3 years ago
- this is an experiment using a depth camera to control a stepping motor, aiming to drive the focus ring and achieve auto focus.☆14Oct 21, 2023Updated 2 years ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆21Jul 21, 2025Updated 7 months ago
- Implementation of the paper LIMITR: Leveraging Local Information for Medical Image-Text Representation☆17Feb 8, 2024Updated 2 years ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆18Feb 29, 2024Updated 2 years ago
- ☆11Nov 17, 2022Updated 3 years ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Sep 14, 2024Updated last year
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆91Nov 15, 2024Updated last year
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆86Mar 21, 2024Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆86Oct 26, 2025Updated 4 months ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆90Nov 13, 2024Updated last year