This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
☆118Jun 18, 2025Updated 9 months ago
Alternatives and similar repositories for Vision-LLM-Alignment
Users that are interested in Vision-LLM-Alignment are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).☆21Jul 2, 2024Updated last year
- Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation☆28Jun 30, 2025Updated 8 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- A RLHF Infrastructure for Vision-Language Models☆197Nov 15, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A repository used to organize content related to Large Speech(Audio) Model, including paper, data, applications, tools and so on.☆28Nov 8, 2025Updated 4 months ago
- A list of conferences and journals relevant to machine translation☆33Mar 17, 2022Updated 4 years ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆90Nov 13, 2024Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆62Aug 23, 2024Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆53Oct 19, 2024Updated last year
- [MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"☆14Nov 1, 2024Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆92Apr 30, 2024Updated last year
- The project for speech translation☆12Sep 28, 2023Updated 2 years ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆86Nov 10, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A benchmark for testing memorization abilities of LMs☆22Oct 15, 2024Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆21Jan 11, 2026Updated 2 months ago
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,380Feb 26, 2026Updated 3 weeks ago
- A new multi-task learning framework using Vision Transformers☆11Jun 19, 2024Updated last year
- An introduction to basic concepts of Transformers and key techniques of their recent advances.☆52Dec 21, 2023Updated 2 years ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆69May 31, 2024Updated last year
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆382Feb 23, 2025Updated last year
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆12Aug 11, 2025Updated 7 months ago
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆310May 21, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆47Dec 30, 2024Updated last year
- [ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".☆12Oct 11, 2024Updated last year
- Explore the Multimodal “Aha Moment” on 2B Model☆624Mar 18, 2025Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆87Oct 26, 2025Updated 5 months ago
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.☆14Mar 20, 2024Updated 2 years ago
- [BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models☆15Nov 1, 2024Updated last year
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆100Jan 30, 2024Updated 2 years ago
- A tool for translating the content of LaTeX documents into various other natural languages (e.g., translating an arXiv paper from English…☆449Mar 12, 2026Updated last week
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation, ECCV 2024☆22Feb 15, 2024Updated 2 years ago
- [CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆217Sep 26, 2025Updated 6 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆306Sep 11, 2024Updated last year
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆14Jun 21, 2024Updated last year
- VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videos☆23Jan 26, 2026Updated last month
- Aligning LMMs with Factually Augmented RLHF☆393Nov 1, 2023Updated 2 years ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year